Brandwatch Spark: Live Virtual Summit | November 7-10, 2023

Join us at Spark, our free virtual marketing conference, where innovation isn’t just a buzzword ✨

Get your free ticket

Published June 20th 2008

Scoring Sentiment

Using machines to understand text is a big part of what we do here at Brandwatch. And sentiment analysis is part of it. As we explain in our blog post explaining sentiment analysis is:

“…the process of determining the emotional tone behind a series of words, used to gain an understanding of the the attitudes, opinions and emotions expressed within an online mention”.

Now this is a very difficult problem to solve with machines. The human brain has evolved for millennia to become well equipped to deal with language. Language is the only way that we understand each other in the absence of other signals such as body language. So it has to deal with all sorts of extremely complex cognitive functions such tone, irony & double meaning as well as being able to deliver the information we want to convey in the most efficient way possible. Furthermore it, language, is evolving all the time as well as becoming abbreviated and iconified ;)

So the challenge facing a machine to make sense of language and determine sentiment is daunting indeed.

Even well-trained wetware isn’t spot-on

We did a test a year ago where we gave two people, both English native speakers, 1000 articles to read about a particular topic and asked them to mark them as either being positive, negative or neutral with respect to the topic. The result – they agreed with each other 85% of the time. So we’re not measureing machine success out of 100 then.

And accuracy as a percentage is not a good measure either.

As an artificial but not entirely unrealistic example, imagine a volume of 1000 documents, made up of 900 neutral, 50 positive, and 50 negative documents. We really would like to find those 100 polar examples. A sentiment classifier that only outputs neutral decisions will miss them completely, but will reach 90% accuracy.

A good example of a sentiment classifier getting it wrong is a false positive or a false negative. That’s to say the machine thought something was say positive, but in fact it was negative. It doesn’t sound like a disaster, but for us this is not good. We call these two-hop mistakes and we have spent a lot of time trying to reduce these errors as far as possible as they erode end user confidence in the system.

The one-hop mis-classifications (negatives which are actually neutral for example) are often more subtle in nature and as you can see from the human experiment, there is a large margin for disagreement.

Different problem domains

Another issue is what kind of language is being used. If we take news sites and focus on something like the financial services sector, the languge that is used is pretty consistent. The terms don’t change much and there isn’t much jargon. Also sentences are well formed and pages tend to have good structure.

Contrast this to online gaming forums or twitter data and you can probably see that we have two very different data sets to work on. The performance of our sentiment classifiers in some of these sectors isn’t good. That’s partly because the training they need is still work in progress, but it’s also because it’s very complex. For example figuring out which text to pass them is often a very tough job. Imagine a forum post that says, “yes i agree with most of the above but I hate their customer service”. Maybe it’s reasonably easy for a human to understand who ‘their’ is, and maybe it isn’t, but machines don’t do conversation, so it’s extremely difficult for them! ahh poor silicon.

Enough with the chat – how accurate are they?


No, it’s not 80% and that’s why I’m writing this post. I read that some of our competitors say things like their sentiment accuracy is 80% and I cringe. 80% of what exactly? Or 80% compared to what? This isn’t a math exam! The simple answer is on well structured and consistent text for a stable domain or industry, we can expect >90% (compared to a native speaker) on the two-hop sentiment classification (ie not saying sentiment is positive when it’s negative) and between 60-70% on the one-hop classification.

Not as good as we humans

But then who’s got time to read through thousands of articles every day?

Share this post
Brandwatch Bulletin

Offering up analysis and data on everything from the events of the day to the latest consumer trends. Subscribe to keep your finger on the world’s pulse.

Get the data
facets Created with Sketch.
facets-bottom Created with Sketch.
New: Consumer Research

Harness the power of digital consumer intelligence

Consumer Research gives you access to deep consumer insights from 100 million online sources and over 1.4 trillion posts.

Brandwatch image
Brandwatch image
Brandwatch image
Brandwatch image

Falcon.io is now part of Brandwatch.
You're in the right place!

Existing customer?Log in to access your existing Falcon products and data via the login menu on the top right of the page.New customer?You'll find the former Falcon products under 'Social Media Management' if you go to 'Our Suite' in the navigation.

Paladin is now Influence.
You're in the right place!

Brandwatch acquired Paladin in March 2022. It's now called Influence, which is part of Brandwatch's Social Media Management solution.Want to access your Paladin account?Use the login menu at the top right corner.