The Independent Group: The Hour-by-Hour Story of Launch Day in Social Data
By Gemma JoyceFeb 19
On Monday 30th March 2015, the current UK parliament will be dissolved.
Five and a half weeks later, on Thursday 7th May, there will be a General Election to decide a new one. This is not prophecy, this is the Fixed Terms Parliament Act of 2011, which has decreed, for the first time, that we know these dates far in advance.
But what happens after 7th May is much harder to predict.
Can social data allow us any foresight?
One would think, with the steady rise in social media usage, and the generated data mountain growing ever taller and (importantly) broader, that the measurability of public opinion should be improving. Given the right tools and the right perspective, we should be able to read these trends with ever increasing acuity.
Surely it cannot be long before polling the electorate becomes a mere formality, with their opinions on all matters already so well advertised through the public channels they publish. It’s just a matter of finding the right way to mine that data.
I experimented with this idea back in May, when I attempted to second-guess the result of the 2014 European Election using Brandwatch data. Basing my analysis on Twitter conversations, I logged mentions of each of the five leading UK parties in the weeks, days and hours before polling.
I shared the data online as it was gathered, as well as placing my metaphorical balls on the chopping board by timestamping my prediction with a Tweet on the day of polling, three days ahead of the result announcement.
How accurate was it? Pretty bloody close.
There were two big headlines the morning of May 23rd 2014, UKIP’s large gains and the LibDem’s significant losses, both of which correlated my prediction to within 1%. The data had predicted a result which at that point was highly uncertain. I wrote up the experiment focusing on these successes, quietly excusing the less accurate estimations of the Conservative, Labour and Green split.
My methodology for this experiment was based on the (spurious) premise that if 1) volume of mentions might be indicative of volume of interest, then 2) volume of interest might be indicative of the expected volume of votes. Either half of this two-step correlation was eminently debatable, but I was going to trust the data to decide its validity.
Volume of mentions = Volume of interest = Volume of votes
I extended my reasoning to propose that the data could be trusted to paper over it’s own cracks, so to speak. We needn’t worry about noise; any questions of messiness or irrelevance in the data would be pretty much cancelled out by the sheer weight of its volume. I didn’t care if the mentions were supportive, derogatory, satirical or just general smart-arsery, it would all come out in the wash. If you want the full argument I’d recommend you read the article, but by the fact the data fell in a way that, kinda, validated my shaky premises, I considered the test passed.
The real proof of this pudding though would be to reproduce it. To see whether it could work again. And again. And again. Fortunately I had the opportunity of a second national vote, the Scottish Referendum, a few months later.
This was another vote that teetered on a very uncertain edge, so any heads-up on the outcome would have been most welcome.
Intention was less easy to track with the Scottish Referendum vote.
Monitoring support for a political party is relatively easy, Brandwatch can easily pick out mentions of a word like “UKIP” and determine the sentiment surrounding it. But the two sides in the Scottish vote were under the banner of “Yes” and “No”, two words it was near impossible to extract with any degree of relevance.
Instead, the query had to be much more sophisticated, searching for mentions of “Scotland” and “better together” along with other extractables such as the date of the vote – a much more convoluted (and error-prone) method. You can see the exact syntax used here.
This meant I had considerably less faith that the volume of these mentions would be representative of voting intention, which was perhaps the only prediction I got right with this one. The result, off the back of a remarkable 85% turnout (i.e. a top quality data set) was close – 45% “Yes” to 55% “No” – yet my Twitter analysis would have predicted a landslide for the “Yes” campaign.
Retrospectively it’s easy to conjure up an explanation for this.
The “Yes” campaign was the break from the status quo. This was the case that had to be made, that had to be argued the loudest. Whereas “No” was a vote to keep things as they were, a condition that required little soapboxing. According to the polls, the swing in the final weeks of the campaign was away from “No” towards “Yes”, with some polls predicting 49:51 on the eve of the election.
But, ultimately, the “Yes” campaign’s march wasn’t quite enough to tip the result. The Twitter data was reflecting the strength of this surge, the weight of influence, not voting intent.
So, one success, one failure, for the predictive power of social data. And next up is the big one. How useful might our data be in the run up to the 2015 General Election?
Already May 7th is looking like a particularly hard one to call. The polls currently put Labour a point or two ahead, but also have Ed Miliband at a much lower approval rating than David Cameron for the next Prime Minister. Both parties are a long way from hope of winning an overall majority though, so the performances of the minor parties could be crucial this time around.
In 2010, when we had our last General Election, the UK was essentially a three-party system; Labour, Conservative and Liberal Democrat. This had been the status quo throughout much of the 20th Century, since the Labour Party overtook the previously dominant Liberals in 1922.
Since then one of the two leading parties, Labour or Conservative, has lead a majority government (with the exception of Churchill’s “War Ministry” 1940-45). The Con-Dem Coalition of 2010 was a significant break to this tradition.
A recent Ipso MORI report suggests this new normal may be here to stay, that it is part of longer, gradual trend of declining support for these two parties. In 1978, 91% of the electorate supported either Labour or the Conservatives. 36 years later this now stands at 66%.
In the run up to the EU Election in May, when I conducted my first experiment, we were considered a five-party system, with UKIP and The Green Party being seen, for the first time, as forces with which to be reckoned. Only a few months later this is now six, following a YouGov/MORI poll that suggested the SNP surge in the wake of the Scottish Referendum may potentially wipe Labour off the map in Scotland.
These minority parties are the big unknowns in this election. The UKIP factor, so evident in the European elections, hasn’t yet been tested in a General Election, so its scale may be hideously over- or under-estimated, depending upon which media outlets you believe. The SNP are likely to be key players, buoyed by the “45” movement spun out of the referendum defeat. And the Greens, who are are suddenly a viable alternative for left-wing voters unimpressed by a Labour Party hard to distinguish from the Tories, may also continue their EU swing. All these stand to make significant gains.
But these gains won’t necessarily mean seats.
The British electoral process, a first-past-the-post system, inherently favours the established parties and protects against radical change. It is the reason a 36.1% share of the vote gave the Conservatives 47.2% of the seats in 2010. It was this un-accidental bias that allowed David Cameron to form his coalition government.
By contrast, the LibDems, their coalition partners, earned 23.0% of the popular vote in that election (how they’d dream of such a figure now), but this won them only 8% of the seats.
It is this imbalance that is the chief reason a social data prediction cannot work for the 2015 General Election. The European Parliament is elected using a proportional representation system, and so the number of elected MPs was much more representative of the popular share. If a party wins a third of the vote, they get a third of the seats. This is why the volume of social media chatter correlated so well in that experiment.
On the same day as the European Election there were also local council elections in some areas. This FPTP vote took place in the same booths at the same time as the EU vote, yet those results were very different.
If we do the same comparison again, but look at the actual councillors elected by the FPTP system, you’ll notice those Euro-election headlines are nowhere to be seen. The resultant balance of power is more in line with the 2010 General Election result than the EU Election result (the one polled at the same time, in the same booths).
The status quo has been preserved. Radical change has been held in check.
From a data angle, this is a problem.
Popular opinion, which Twitter manages to measure rather well, does not correspond with how power might be distributed following a UK General Election. And the unfortunate conclusion here is that any prediction would fail not because Twitter is unrepresentative, but because the electoral system is unrepresentative.
I might argue that Twitter is actually a better gauge of opinion than a General Election. Which is a terrifying thought.
The result is further skewed by the fact that voters understand the system and have to work within its limitations. A constituent may not have the luxury of voting for the party of their choice, especially if it is third party in two-horse race. So for their vote to mean something they may instead vote “tactically” – voting for the party with the best chance of winning in their local area, rather than the party they want to lead them.
This is the difference between the Local and European results, why one did fit my prediction, while another didn’t. When the FPTP system is used, my two-part assumption is no longer true:
Volume of mentions = Volume of interest ≠ Volume of votes
So, scroll back up to the top of this page, and you’ll see a representation of the current share of mentions for the six parties, using data extracted from Brandwatch’s Analytics app.
We could use this data to make a prediction on the General Election if we wanted to. I’d claim it would give a pretty accurate representation of the national spread of support for the various parties. But I would not expect the Twitter data to reflect the result of the vote.
I would expect it to show a truer representation of the parliament the people actually want.
Rather than the one they’re going to get.