How does sentiment analysis work? Community

By Dominick Soar on April 20th 2011

Sentiment analysis continues to be one of the most debated areas in social media monitoring, so we thought we’d give a brief overview of the topic and how we approach it at Brandwatch. Here’s our resident expert, Dr Taras Zagibalov, who conducts all our language research and works alongside the tech team to continuously improve our automatic sentiment classification. Over to Taras:

There are two major techniques used in automatic sentiment analysis. The most frequently used one in commercial applications is based on linguistic resources, the other is based on machine learning.

Linguistic Resources

This technique, in its simplest form, is based on predetermined lists of positive and negative words. The utterance or phrase in question is checked for how many times any of these words appear in it:

A simple example:

Sky+ is good and useful but slightly expensive.

This phrase would be regarded as positive, as it contains two words from the positive list (‘good’ and ‘useful’) and only one word from the negative list (‘expensive’). A slightly more sophisticated approach may use different scores/weights for different words and account for negation (e.g. ‘not good’).

A further step is to take care of larger linguistic units (phrases and sentences): analysis systems can rely on patterns to be able to recognise sentiment. For example, a pattern: “NP posVerb X” (Noun Phrase + positive verb + Brand/product name) may capture phrases like “I love the Samsung Galaxy Tab” and “My friend prefers Sony”. But this approach can involve a number of linguistic techniques which are not always robust and are often quite time and labour intensive (parsing, part-of-speech tagging, entity extraction and so on).

The main pitfall of sentiment analysis based on linguistic resources is that we cannot always predict the ways sentiment is expressed:

  • It is difficult to define sentiment orientation of topic-dependent words, for example “long”: “long battery life” is good but “a long wait” may not be good at all.
  • The word “still” may be a useful indicator of positive sentiment in some contexts: “…still, I love this gadget”, but negative in others: “…still, I’m not happy with the service”.
  • The word “good” can even have a slightly negative meaning, in eBay reviews for example: it is common to say “perfect delivery” or “outstanding delivery” if being genuinely positive, while saying “good delivery” can be taken to mean mediocre.

This approach also assumes that people use “normal” or “standard” (predictable) language. This is rarely the case, and particularly not in social media; people use all kinds of dialects and slang to express their feelings (“lol it sucks” or “that is so sick” etc).

Machine Learning

The other technique (based on machine learning) relies on a computer’s ability to automatically learn the language used for expressing sentiment regardless of how “good” or “normal” the language is.

But there is no magic and nothing comes for free. The machine needs some information to learn from (called a training corpus) and, in the case of sentiment analysis, this is a set of examples annotated by humans. The more examples the machine has to learn from the better – thousands of examples are better than hundreds.

Once the machine has learned the examples it can apply the acquired knowledge to new, unseen documents and classify them into sentiment categories. But this technology isn’t perfect either. The problem is domain-dependency: if a machine was trained on a corpus of movie reviews, it will be very inaccurate if applied to, say, reviews of automobiles. It means that one needs to train a machine in all domains it is to be used.

Brandwatch Sentiment Analysis

At Brandwatch, our sentiment analysis system is based on the second technique: machine learning.

When you set up a query in Brandwatch, you are asked to select an industry for the query before saving it. This is the part which addresses the hurdle faced by machine learning sentiment analysis: by selecting the industry you are telling our sentiment machine which domain or which classifier it should use. It then knows which context your results are likely to be in, and will use its knowledge from that domain to classify accordingly.

We have over 500 classifiers across all the languages we cover. We are currently working on a development which will mean the system will be able to look at your query and automatically detect which classifier/industry is most appropriate based on the terms you have used. Automating this process will ensure the sentiment categorisation is as accurate as possible.

Take a look at Taras’ guest post on Social Times Sentiment Analysis: When Machines Can Beat Humans for further discussion.

Dominick Soar


Dominick is social media and content manager at ticketscript, Europe's leading self-service ticketing software, offering free solutions across digital, mobile and social. Dom is also a Brandwatch alumni, and has a rich heritage in social media marketing, especially in the marketing technologies space.