Interview: Professor Mike McGuirk on How Brandwatch For Students is Used in His Classroom
By Olivia SwainSep 6
As I mentioned in the last post, the accuracy of the entire result set is critical to us at Brandwatch, but so too is making sure that we have as big a sample as possible.
We are increasing our crawl all the time (circa 100k new sites per month right now), but a comparison with Google’s index is something that we do on a regular basis to cross check our total result count with theirs.
Whilst doing this I stumbled upon a rather surprising little secret.
If you search for a reasonably unusual keyphrase in Google you will most likely get a very large number of results
eg Searching for “social media analysis” apparently got about 44,100 results at 1030 GMT on 17 June 2008.
WOW that’s a lot I thought. And I’m sure I’m not the only one who sees those numbers for Google search results and thinks that.
So during my analysis I decided to check a sample of the top 1,000 results, which is all they will give you
But there weren’t 1,000 results. There weren’t even 500. There were 330
nb: this is with safe search off and for the whole web.
There was a link at the bottom of this page
“In order to show you the most relevant results, we have omitted some entries very similar to the 332 already displayed.
If you like, you can repeat the search with the omitted results included.”
this gave me more than 1,000 but a quick look showed lots and lots of duplicates.
I’m not sure how they get these enormous numbers, but they aren’t just a few percent off, they are orders of magnitude off and although I don’t think people pay too much attention to them, it’s just wrong to put them up there. I was left with a sense of being spun (is that an british-only phrase?), or rather felt slightly manipulated and my trust in Google has gone down a couple of notches