3 Awesome Ways to Use Audience Uploads
By Mercedes Lois BullNov 18
Using machines to understand text is a big part of what we do here at Brandwatch. And sentiment analysis is part of it. As we explain in our blog post explaining sentiment analysis is:
“…the process of determining the emotional tone behind a series of words, used to gain an understanding of the the attitudes, opinions and emotions expressed within an online mention”.
Now this is a very difficult problem to solve with machines. The human brain has evolved for millennia to become well equipped to deal with language. Language is the only way that we understand each other in the absence of other signals such as body language. So it has to deal with all sorts of extremely complex cognitive functions such tone, irony & double meaning as well as being able to deliver the information we want to convey in the most efficient way possible. Furthermore it, language, is evolving all the time as well as becoming abbreviated and iconified ;)
So the challenge facing a machine to make sense of language and determine sentiment is daunting indeed.
We did a test a year ago where we gave two people, both English native speakers, 1000 articles to read about a particular topic and asked them to mark them as either being positive, negative or neutral with respect to the topic. The result – they agreed with each other 85% of the time. So we’re not measureing machine success out of 100 then.
And accuracy as a percentage is not a good measure either.
As an artificial but not entirely unrealistic example, imagine a volume of 1000 documents, made up of 900 neutral, 50 positive, and 50 negative documents. We really would like to find those 100 polar examples. A sentiment classifier that only outputs neutral decisions will miss them completely, but will reach 90% accuracy.
A good example of a sentiment classifier getting it wrong is a false positive or a false negative. That’s to say the machine thought something was say positive, but in fact it was negative. It doesn’t sound like a disaster, but for us this is not good. We call these two-hop mistakes and we have spent a lot of time trying to reduce these errors as far as possible as they erode end user confidence in the system.
The one-hop mis-classifications (negatives which are actually neutral for example) are often more subtle in nature and as you can see from the human experiment, there is a large margin for disagreement.
Another issue is what kind of language is being used. If we take news sites and focus on something like the financial services sector, the languge that is used is pretty consistent. The terms don’t change much and there isn’t much jargon. Also sentences are well formed and pages tend to have good structure.
Contrast this to online gaming forums or twitter data and you can probably see that we have two very different data sets to work on. The performance of our sentiment classifiers in some of these sectors isn’t good. That’s partly because the training they need is still work in progress, but it’s also because it’s very complex. For example figuring out which text to pass them is often a very tough job. Imagine a forum post that says, “yes i agree with most of the above but I hate their customer service”. Maybe it’s reasonably easy for a human to understand who ‘their’ is, and maybe it isn’t, but machines don’t do conversation, so it’s extremely difficult for them! ahh poor silicon.
No, it’s not 80% and that’s why I’m writing this post. I read that some of our competitors say things like their sentiment accuracy is 80% and I cringe. 80% of what exactly? Or 80% compared to what? This isn’t a math exam! The simple answer is on well structured and consistent text for a stable domain or industry, we can expect >90% (compared to a native speaker) on the two-hop sentiment classification (ie not saying sentiment is positive when it’s negative) and between 60-70% on the one-hop classification.
Not as good as we humans
But then who’s got time to read through thousands of articles every day?