Interview: Exploring Data Science at Brandwatch with Hamish Morgan
By Olivia SwainAug 23
With the proliferation of social data in the media and the availability of digital data in general, there is the temptation by many to use this to publish analysis around major events or interesting stories.
And for good reason – the media are undoubtedly bewitched by reporting predictions using social data.
Let’s take last night’s Oscars ceremony as an example.
Putting a number, something tangible and real, on a byline about who might scoop the Best Actor or Actress statuette means guaranteed clicks (or indeed, news-stand sales). Predictions about who would triumph at the ceremony have been doing the rounds since last summer, when some of the films actually nominated hadn’t even seen the light of day.
However, simply publishing raw volumes of mentions, or ‘buzz’, around such things can invite dangerous conclusions, and, much like with sentiment data, as an industry we should take care to help contextualize and analyze data before others reach conclusions about the raw numbers unaided.
Context should be provided using additional information to further enrich the results and ensure that the social data is not devalued.
The social space is overflowing with examples of data where, without a consideration of the broader context, meaning can be misconstrued entirely – take the 2013 study, ‘Extracting Diurnal Patterns of Real World Activity from Social Media.’
The research combined Hurricane Sandy-related Twitter data and Foursquare data for the wider New York area, and revealed that during the peak of the storm, the majority of Tweets were actually from Manhattan, reflecting Twitter usage patterns rather than storm damage or impact.
Data promoted by Adobe Social and Hootsuite about last night’s Oscars is another example of this, illustrating how often, in the hands of the media without the context of analysis, erroneous conclusions can be drawn.
Both companies used total buzz figures – ie; most discussed overall – to make their predictions on who would take the golden prize home in each category. Take Hootsuite’s predictions as an example.
With 2.7 million mentions, American Sniper was the most talked about motion picture overall, however with a little bit of digging we know that there are various reasons that this film was likely to be most talked about – reasons that Hootsuite went into themselves.
“Several important considerations keep us from proclaiming American Sniper the ultimate winner… many mentions could be references to the book, as well as the controversial figure of the late Chris Kyle himself, or the influence of the film adaptation of Kyle’s life may have on the outcome of his accused murderer’s trial (which is still in progress).
Lastly, extensive media coverage of the conflict over royalties for the film adaptation of Kyle’s book has undoubtedly contributed to some of the social discussions.”
This is certainly not criticism of those vendors, as both stated that measuring the sheer volume of mentions alone was unlikely to have any impact upon what is essentially a closed-door panel decision, rather than a public vote.
Furthermore, without an open dataset or publication of working, there is no unified way of comparing accuracy.
Even after accounting for differences in the quality of spam filters and varying search term inclusions and competencies, Hootsuite and Adobe’s data differs by an order of magnitude.
Though this year we opted not to delve too deeply into data around the Oscars, we did tweet a few graphs from our @PeerIndex account relating to the topic.
It’s more interesting and insightful to place published data within a context, and use more sophisticated searches to begin dissecting the data.
For example, segmenting the conversation by gender reveals that conversation about each nominee differs in its split, as shown for actors in the tweet below, and for films in the below Tweet.
— Brandwatch PeerIndex (@PeerIndex) February 23, 2015
Moreover, even when releasing raw ‘buzz’ figures, it is best to be cautious not to imply that this might mean an actor or actress will have any stronger chance of winning or not, simply that discussion is increasing and that nominees are discussed at varying volumes over time.
But what value can social data add to events like the Oscars?
Well, rather than track buzz alone, using Query language we can actually isolate just the conversation that directly focuses on explicit predictions.
— Brandwatch PeerIndex (@PeerIndex) February 23, 2015
This tends to have a closer correlation to the actual results, perhaps because ‘the wisdom of the crowd’ comes close to anticipating the opinions of the 5000-strong membership of Oscar voters.
This was demonstrated by analyzing the public conversation data for direct predictions in the week running up to the ceremony itself.
We have delved into Oscars data before however, in both 2013 and 2014, when we worked with the Motion Picture Association of America (The Credits).
Instead of simply tracking total mentions, we divided the audience into a group of critics and general members of the public.
By listening to conversation from those groups, again targeted to only find direct predictions of the results, more meaningful insights could be drawn. These included examples of critics predicting one winner, the public predicting another, and the real winner being someone else entirely.
By being explicit about the type of conversation tracked, and by granting users the tools to dive into the data and discover their own insights, a better picture of the capabilities of social data could be drawn.
And, for what it’s worth, both the public and the critics predicted most of the winners correctly, managing to anticipate 15/18 of the tracked award winners in 2013 – more than Nate Silver, the American statistician who famously correctly predicted the outcome of the most recent US election in all 50 states, who managed just 67%.
“While social media can’t tell us who will win, we can learn who would, if the internets had their way” stated Hootsuite before this year’s ceremony.
They, and Adobe, were right to distance themselves from the more controversial conclusions drawn by some – such as the headlines proclaiming that total buzz equaled a win – but 2015 should be the year when social data matures from buzz counting and develops into a technique that genuinely uncovers insights and adds value to the topics it is pointed towards.
After all, data without context is really no story at all.