As a Social Media Monitoring company, the quality of our coverage is integral to our performance as a business: but what exactly does that mean?
Even that term – social media monitoring – isn’t quite accurate. We don’t just monitor the social web; we listen to all online activity.
Social Media can mean lots of different things to different people, and many will argue that the web has always been as social as it is in its current Twitter and Facebook-infested form, through comparatively archaic things like email, Usenet and IM.
The boundaries of what is social and what isn’t ultimately don’t matter too much to us. We crawl the entire web to make sure that if anyone is talking about you or what you’re interested in tracking, we’ll be able to find it.
So if we’re not just listening for content published on social sites, what does Brandwatch track?
The following article should help you make a little more sense of exactly what the 60,000,000+ pages we crawl each day includes.
Information is the currency of the digital age, and tracking articles published on news websites is one of the most important applications of our tool. PR departments and campaign managers can easily keep tabs on which news sites their stories are reaching, as Brandwatch includes thousands of the most important news sites available.
We also use a blacklist approach to crawling. This means that we attempt to crawl every news website there is – excluding those behind a paywall – before removing the spam and irrelevant mentions. This manifests as comprehensive coverage of every major and minor news website online, from regional papers to international institutions.
As with news websites, all of our coverage operates on this blacklist method, meaning we crawl countless thousands of forums, before eliminating those that aren’t useful.
This technique is better than a whitelist approach, as it ensures complete coverage, rather than working up a useful list from nothing.
We’re also able to extract individual comments on forum threads, and the only forums we’re missing are those that have politely asked us not to crawl them or have their privacy settings set to private.
This can include image boards such as 4chan, social bookmarking sites like StumbleUpon or even review sites like TripAdvisor.
Social media sites make up a significant bulk in the type of content that our clients are most interested in listening to, hence the name ‘social media monitoring’.
Coverage of sites like LinkedIn and Facebook are notoriously difficult to retrieve data from, as both networks have a stringent set of scraping rules and privacy controls that prevent us from taking everything published on those platforms.
We do however, have relationships with a number of the key networks to ensure our coverage is as good as it can possibly be. These relationships can lead to 100% access to social data.
Above is a selection of websites that we are able to crawl to some extent, though there are some restrictions for some of them, such as LinkedIn, where content like profile pages are strictly off-limits to us listening companies.
International nuances such as the East’s preference for sites like RenRen, Wiebo and Orkut are also considerations we take into account when determining which sites to crawl.
A huge chunk of the internet is made up from blogs. This includes hubs for leading internet discourse, fierce diatribes against just about anything and endless porn-focused spam disasters.
We use sophisticated systems to extract only the relevant stuff from sprawling blog networks like Tumblr, Blogspot and WordPress to produce a list of millions of blogs to crawl – an eight-digit number that is updated every day.
We’re also sure to include all industry blogs, from corporate-produced articles to mainstream sites like Wired and Techcrunch.
As the prevalence of video and image-based content increases, we strive to make sure that our coverage is reflective of that. While 100% coverage of these sites is again unviable – for similar reasons to why other social networks are difficult to entirely cover – we are able to extract a significant percentage of content from the following sites:
Other types of sites
Not all sites can be pigeonholed into pre-defined genres. The sheer extent of personal portfolios, review archives, corporate articles and other miscellaneous websites make up a large proportion of the sites online, and subsequently also the sites we cover.
It’s tough to wedge all these sites under one single umbrella, but rest assured that if the site is reasonably significant (written by a human and with real visitors), we’ll be listening.
Alongside catering for regional markets in terms of which sites we crawl, we’re also sensitive to the language that mentions are published in. We are capable of tracking mentions in 25 languages, and we’re adding more with every month that passes. Our renowned sentiment analysis is also available for most of the languages we track.
- Arabic BETA
- Brazilian Portuguese
- Chinese (Simplified)
- Chinese (Traditional)
- Egyptian Arabic BETA
- European Portuguese
- Farsi BETA
- Gulf Arabic BETA
- Hebrew BETA
So you can now see just how much of the internet we’re able to cover, and what thoughts and considerations we have to make when crawling the web. If you’d like to know more about how comprehensive our coverage is, the specific quality of our crawling for each site, how our spam works or any other question about our data, please don’t hesitate to get in touch with us on Twitter, Facebook or by email.