Interview: Professor Mike McGuirk on How Brandwatch For Students is Used in His Classroom
By Olivia SwainSep 6
Recall is important. Google likes to boast about its recall capacity by saying that you are looking at results 1-10 of x million where x is usually unfeasibly large (more on this rather bogus figure in later posts). Aside from recall what is important to Google is the relevance of the top results (this is the precision bit). Page Rank which forms a big part of the answer to how far up Google your results appear is their secret sauce, although Larry and Sergey’s original thesis is public knowledge.
It’s worth emphasizing that it’s the top results that count for Google. The figures for how many people click through to pages 2,3 etc are low. So precision is extremely targeted. In effect the top 20 or so results are the only ones that matter for any search request. Of course automatically picking the best 20 results out of millions for each search request is incredibly difficult, but that is Google’s challenge and I for one don’t feel that sorry for them.
For us at Brandwatch, precision is different. Brandwatch analyses EVERYTHING that is said about a keyword. It’s like taking all the results from a Google search and trying to make sense of them. Now our index isn’t as large as the big G’s so I don’t want to set any false expectations, but that is a pretty good description of our challenge. To put it another way, result 1 million is as important to us at Brandwatch as result number one (we look out for the little guys :) ).
Although our challenge is different to the one Google faces, it’s another toughie. So far we’re doing pretty well – last week we had 78% precision – ie almost 8 out of 10 of the pages we analyse actually relate to the keywords we are tracking. Although not perfect, this is enough to derive good data about the topics of conversation, sentiment, and trends.
Reputation Management and the other services we offer rely on good quality data, so when we talk about bad matches, and difficult keywords to isolate, it’s because unstructured data analysis is an obsession of ours that’s not going to go away.