I went to hear Nate Silver speak last Tuesday as part of his UK speaking tour. You may have heard of him: an American statistician, pollster and data nerd (I’m not mocking him: there was a bloke in the audience with that written on a T-shirt. I sort of coveted it.)
He correctly predicted the outcome of the most recent US election in all 50 states.
Smart, yet humble, one of Silver’s key premises is that whilst so-called Big Data yields a rich source of predictive insight, it’s only as insightful as those analysing it. In other words, analysis counts and, without it, the use to which we put data is unacceptably fallible.
In his own words (from his recent book The Signal and the Noise, page 9):
“Data-driven predictions can succeed – and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves.”
Big Data is mentioned, it seems, at practically every conference I attend these days. And I attend quite a few.
It’s an exciting time in the data world, and the use to which we put data and how we handle the volume, variety and velocity of Big Data is a question that many claim to have addressed with their great services or methodologies.
But can it be done at all?
The challenges within data
Challenges inherent within data – big or otherwise – have been listed by more statistically savvy minds than mine, not least of which is Silver’s. To summarise and paraphrase some of those challenges:
- Data without analysis lacks context
- Data creates bigger haystacks; there’s more ‘noise’ but not necessarily a greater number of ‘signals’ (insight, in other words)
- Datasets are never complete; modelling using an incomplete dataset increases the risk that you’re missing the key insight that will prove the exception to the rule and discount your theory
And, perhaps most importantly:
- There are biases contained within data, from its compilation through to its interpretation
Social media data
It’s a moot point whether social media data is actually ‘Big’. But let’s assume for a moment that it shares enough of Big Data’s characteristics to be taken as such, particularly in aggregate with other data sources.
It becomes clear that, despite the challenges, this data also has value because:
- It yields interesting results around consumer intention, if not prediction about their future behaviour
- Sourcing it is fast as well as relatively cheap, compared to your average polling methodology
- And manipulating it (using tools like Brandwatch) to gain insight is a fairly straightforward process
All of this returns us to our original point: without analysis, data will not simply magically yield insight. This is felt particularly keenly when analysing social sources.
It’s all about context
Context is provided by additional information such as metadata and biographical information and the analyst. This context further enriches the results and helps to create a story out of them.
The social space is filled to the brim with examples of data where, without a consideration of the broader context, meaning can be misconstrued entirely.
A recent study combining Hurricane Sandy-related Twitter data and Foursquare data for the wider New York area demonstrated this.
During the peak of the storm, the majority of tweets were actually from Manhattan, reflecting Twitter usage patterns rather than storm damage or impact. Correlation does not equal causation.
The very ambiguity of human emotion makes it difficult to analyse in aggregate, but social data combined with analysis of that data gives us verbatim insight, alongside indicative qualitative findings, into the minds of consumers, customers, clients, the end user in other words.
Qualitative and quantitative
For those, like Nate Silver, who are interested in who’s going to win elections (though he did say last week that he’s a bit bored with talking about the recent US election), the combination of qualitative and quantitative insight tells a rich story that people love to hear about the battle for hearts and minds of the American population in a highly polarised voter context.
An example closer to home would be to look at the impact of Twitter activity on David Cameron’s account (he doesn’t write his own tweets, or does he?)
We did a piece recently looking at this, and found that there is an optimum level of engagement and diminishing returns when he (or one of his people) post more frequently.
Of course, it’s also important to factor in what is published. People switch off when they reach saturation point (not just in the social sphere, right?) and this depends as much on what is said as well as how frequently he (or anyone) speaks.
There is plenty of insight to be gained from analysing data, and I spend most of my day talking to clients about how to do this. But what I love about data – and this is an oft under-appreciated aspect of it – is that it’s actually about people.
Apart from the obvious exception of spam, behind every data point is a person who has something to say.
In the social space it is no longer an option not to listen, but just be aware of assumptions, extrapolations and, of course, your own bias when doing so.