US
  • US
  • UK

Language

  • English
  • Deutsch
  • Español
  • Français
  • Indonesian

Can Social Data Predict the 2015 UK General Election? Politics

By Matt Pearson on January 16th 2015

On Monday 30th March 2015, the current UK parliament will be dissolved.

Five and a half weeks later, on Thursday 7th May, there will be a General Election to decide a new one. This is not prophecy, this is the Fixed Terms Parliament Act of 2011, which has decreed, for the first time, that we know these dates far in advance.

But what happens after 7th May is much harder to predict.

Can social data allow us any foresight?


Brandwatch data on Twitter mentions of the six main parties, updated every two minutes. You can access the raw data here.

One would think, with the steady rise in social media usage, and the generated data mountain growing ever taller and (importantly) broader, that the measurability of public opinion should be improving. Given the right tools and the right perspective, we should be able to read these trends with ever increasing acuity.

Surely it cannot be long before polling the electorate becomes a mere formality, with their opinions on all matters already so well advertised through the public channels they publish. It’s just a matter of finding the right way to mine that data.

I experimented with this idea back in May, when I attempted to second-guess the result of the 2014 European Election using Brandwatch data. Basing my analysis on Twitter conversations, I logged mentions of each of the five leading UK parties in the weeks, days and hours before polling.

I shared the data online as it was gathered, as well as placing my metaphorical balls on the chopping board by timestamping my prediction with a Tweet on the day of polling, three days ahead of the result announcement.

How accurate was it? Pretty bloody close.


How to predict an election

comparison

My social media prediction of the 2014 EU Election vote (in the UK), compared with the actual result

There were two big headlines the morning of May 23rd 2014, UKIP’s large gains and the LibDem’s significant losses, both of which correlated my prediction to within 1%. The data had predicted a result which at that point was highly uncertain. I wrote up the experiment focusing on these successes, quietly excusing the less accurate estimations of the Conservative, Labour and Green split.

My methodology for this experiment was based on the (spurious) premise that if 1) volume of mentions might be indicative of volume of interest, then 2) volume of interest might be indicative of the expected volume of votes. Either half of this two-step correlation was eminently debatable, but I was going to trust the data to decide its validity.

Volume of mentions = Volume of interest = Volume of votes

I extended my reasoning to propose that the data could be trusted to paper over it’s own cracks, so to speak. We needn’t worry about noise; any questions of messiness or irrelevance in the data would be pretty much cancelled out by the sheer weight of its volume. I didn’t care if the mentions were supportive, derogatory, satirical or just general smart-arsery, it would all come out in the wash. If you want the full argument I’d recommend you read the article, but by the fact the data fell in a way that, kinda, validated my shaky premises, I considered the test passed.

The real proof of this pudding though would be to reproduce it. To see whether it could work again. And again. And again. Fortunately I had the opportunity of a second national vote, the Scottish Referendum, a few months later.

This was another vote that teetered on a very uncertain edge, so any heads-up on the outcome would have been most welcome.


How not to predict an election

Intention was less easy to track with the Scottish Referendum vote.

Monitoring support for a political party is relatively easy, Brandwatch can easily pick out mentions of a word like “UKIP” and determine the sentiment surrounding it. But the two sides in the Scottish vote were under the banner of “Yes” and “No”, two words it was near impossible to extract with any degree of relevance.

Instead, the query had to be much more sophisticated, searching for mentions of “Scotland” and “better together” along with other extractables such as the date of the vote – a much more convoluted (and error-prone) method. You can see the exact syntax used here.

This meant I had considerably less faith that the volume of these mentions would be representative of voting intention, which was perhaps the only prediction I got right with this one. The result, off the back of a remarkable 85% turnout (i.e. a top quality data set) was close – 45% “Yes” to 55% “No” – yet my Twitter analysis would have predicted a landslide for the “Yes” campaign.

scotland2

Twitter’s balance of yes/no mentions around the 2014 Scottish Independence Referendum, on the day of polling. source

Retrospectively it’s easy to conjure up an explanation for this.

The “Yes” campaign was the break from the status quo. This was the case that had to be made, that had to be argued the loudest. Whereas “No” was a vote to keep things as they were, a condition that required little soapboxing. According to the polls, the swing in the final weeks of the campaign was away from “No” towards “Yes”, with some polls predicting 49:51 on the eve of the election.

But, ultimately, the “Yes” campaign’s march wasn’t quite enough to tip the result. The Twitter data was reflecting the strength of this surge, the weight of influence, not voting intent.

So, one success, one failure, for the predictive power of social data. And next up is the big one. How useful might our data be in the run up to the 2015 General Election?


The UK’s new six-party system

Already May 7th is looking like a particularly hard one to call. The polls currently put Labour a point or two ahead, but also have Ed Miliband at a much lower approval rating than David Cameron for the next Prime Minister. Both parties are a long way from hope of winning an overall majority though, so the performances of the minor parties could be crucial this time around.

In 2010, when we had our last General Election, the UK was essentially a three-party system; Labour, Conservative and Liberal Democrat. This had been the status quo throughout much of the 20th Century, since the Labour Party overtook the previously dominant Liberals in 1922.

Since then one of the two leading parties, Labour or Conservative, has lead a majority government (with the exception of Churchill’s “War Ministry” 1940-45). The Con-Dem Coalition of 2010 was a significant break to this tradition.

ipso_mori_vote_share

Ipso MORI data on the combined Labour and Conservative vote share.

A recent Ipso MORI report suggests this new normal may be here to stay, that it is part of longer, gradual trend of declining support for these two parties. In 1978, 91% of the electorate supported either Labour or the Conservatives. 36 years later this now stands at 66%.

In the run up to the EU Election in May, when I conducted my first experiment, we were considered a five-party system, with UKIP and The Green Party being seen, for the first time, as forces with which to be reckoned. Only a few months later this is now six, following a YouGov/MORI poll that suggested the SNP surge in the wake of the Scottish Referendum may potentially wipe Labour off the map in Scotland.

These minority parties are the big unknowns in this election. The UKIP factor, so evident in the European elections, hasn’t yet been tested in a General Election, so its scale may be hideously over- or under-estimated, depending upon which media outlets you believe. The SNP are likely to be key players, buoyed by the “45” movement spun out of the referendum defeat. And the Greens, who are are suddenly a viable alternative for left-wing voters unimpressed by a Labour Party hard to distinguish from the Tories, may also continue their EU swing. All these stand to make significant gains.

But these gains won’t necessarily mean seats.


Unseating the establishment

The British electoral process, a first-past-the-post system, inherently favours the established parties and protects against radical change. It is the reason a 36.1% share of the vote gave the Conservatives 47.2% of the seats in 2010. It was this un-accidental bias that allowed David Cameron to form his coalition government.

By contrast, the LibDems, their coalition partners, earned 23.0% of the popular vote in that election (how they’d dream of such a figure now), but this won them only 8% of the seats.

It is this imbalance that is the chief reason a social data prediction cannot work for the 2015 General Election. The European Parliament is elected using a proportional representation system, and so the number of elected MPs was much more representative of the popular share. If a party wins a third of the vote, they get a third of the seats. This is why the volume of social media chatter correlated so well in that experiment.

On the same day as the European Election there were also local council elections in some areas. This FPTP vote took place in the same booths at the same time as the EU vote, yet those results were very different.

eu_vs_local

% share of the vote in 2014 Local Election, compared with the EU Election result, polled on the same day. Note overall Green Party figure is hard to determine.

If we do the same comparison again, but look at the actual councillors elected by the FPTP system, you’ll notice those Euro-election headlines are nowhere to be seen. The resultant balance of power is more in line with the 2010 General Election result than the EU Election result (the one polled at the same time, in the same booths).

local_councillors

% share of councillors elected in 2014 Local Election, compared with the EU Election result.

The status quo has been preserved. Radical change has been held in check.

From a data angle, this is a problem.


Twitter politics

Popular opinion, which Twitter manages to measure rather well, does not correspond with how power might be distributed following a UK General Election. And the unfortunate conclusion here is that any prediction would fail not because Twitter is unrepresentative, but because the electoral system is unrepresentative.

I might argue that Twitter is actually a better gauge of opinion than a General Election. Which is a terrifying thought.

The result is further skewed by the fact that voters understand the system and have to work within its limitations. A constituent may not have the luxury of voting for the party of their choice, especially if it is third party in two-horse race. So for their vote to mean something they may instead vote “tactically” – voting for the party with the best chance of winning in their local area, rather than the party they want to lead them.

This is the difference between the Local and European results, why one did fit my prediction, while another didn’t. When the FPTP system is used, my two-part assumption is no longer true:

Volume of mentions = Volume of interest ≠ Volume of votes

So, scroll back up to the top of this page, and you’ll see a representation of the current share of mentions for the six parties, using data extracted from Brandwatch’s Analytics app. There are also instructions on how to access the raw data here. The data is updated hourly, so shows the current shape of the chatter.

We could use this data to make a prediction on the General Election if we wanted to. I’d claim it would give a pretty accurate representation of the national spread of support for the various parties. But I would not expect the Twitter data to reflect the result of the vote.

I would expect it to show a truer representation of the parliament the people actually want.

Rather than the one they’re going to get.


  • LinguaBrand

    Matt, this analysis is almost entirely wrong. Your EU 2014 predictions are WAY out of line with the results. Your twitter analysis got the Greens wrong by 36%, Tories by 30%, LibDems by 8%, Labour by 7% and UKIP by 1%. That’s an average 17% out; which is huge. Only your UKIP prediction came within the pollsters range of +/- 3%. Twitter got the Scotland No result wrong by 112%.

    You’re confusing percentage points with percentages.

    Maybe Twitter will do better with this election. Let’s see. Be good if it does. But it’s tendentious to claim accuracy given the examples you quote predict 1 of 6 results.

    Your take on voting systems neglects to mention the Lib-Dem coalition electoral reform deal. We were given the choice for change. The vote was firmly against. So that’s a dead duck of debate right now don’t you think?

  • Matt Pearson

    Hi LB,

    I think you’re challenging my labelling, not my analysis (I hope). The Y axis on that graph is percentage share of the vote, so if I were to mark the differences as percentages of a percentage it would only be confusing.

    So, yes, the differences are percentage points, rather than percentages. I don’t claim otherwise.

    M

  • LinguaBrand

    Our previous post couldn’t be clearer. Possibly wrong but clear. Your take on the figures is confusing in suggesting twitter has been accurate in predicting election results. Our take on your figures is that it hasn’t. Readers can read both and come to their own conclusions.

  • GiselleBodie

    Hi Matt. Isn’t true to say – and I know this will be difficult for Brandwatch – that your data analysis on both recent UK elections and last year, the Scottish Referendum was wildly incorrect? Your analysis of social media data was even “more wrong” than the traditional polls in both cases. Brandwatch can only analyse what people who are actively involved with social media are saying, not the wider public. And the “wider public” are still not using social media to express their views in sufficient numbers to make any analysis of this kind viable or useful.

  • Matt Pearson

    Hi Giselle. Yes, it’s true to say. And this election was a bombshell for pollsters. But I’d argue the “wider public” are not represented by the subset of traditional poll respondents any more than they are by the subset of social media.

    Our style of data analysis is still very young compared to traditional polling, but I’m more optimistic for its future because 1) social data gives us a MUCH larger sample, 2) year on year the demographic is broadening, and 3) the inherent biases are (theoretically) predictable and measurable.

    This third part is the challenge. It’s a matter of preempting and adjusting for these biases, which is much easier said than done. And it’s a problem traditional polling still hasn’t solved, even with a number of decades head-start.