Cutting-edge social research and practical insights delivered straight to your inbox
You’re now subscribed to our
Brandwatch Knowledge newsletter.
Find it in your inbox every
Sentiment analysis continues to be one of the most debated areas in social media monitoring, so we thought we’d give a brief overview of the topic and how we approach it at Brandwatch. Here’s our resident expert, Dr Taras Zagibalov, who conducts all our language research and works alongside the tech team to continuously improve our automatic sentiment classification. Over to Taras:
There are two major techniques used in automatic sentiment analysis. The most frequently used one in commercial applications is based on linguistic resources, the other is based on machine learning.
This technique, in its simplest form, is based on predetermined lists of positive and negative words. The utterance or phrase in question is checked for how many times any of these words appear in it:
A simple example:
Sky+ is good and useful but slightly expensive.
This phrase would be regarded as positive, as it contains two words from the positive list (‘good’ and ‘useful’) and only one word from the negative list (‘expensive’). A slightly more sophisticated approach may use different scores/weights for different words and account for negation (e.g. ‘not good’).
A further step is to take care of larger linguistic units (phrases and sentences): analysis systems can rely on patterns to be able to recognise sentiment. For example, a pattern: “NP posVerb X” (Noun Phrase + positive verb + Brand/product name) may capture phrases like “I love the Samsung Galaxy Tab” and “My friend prefers Sony”. But this approach can involve a number of linguistic techniques which are not always robust and are often quite time and labour intensive (parsing, part-of-speech tagging, entity extraction and so on).
The main pitfall of sentiment analysis based on linguistic resources is that we cannot always predict the ways sentiment is expressed:
This approach also assumes that people use “normal” or “standard” (predictable) language. This is rarely the case, and particularly not in social media; people use all kinds of dialects and slang to express their feelings (“lol it sucks” or “that is so sick” etc).
The other technique (based on machine learning) relies on a computer’s ability to automatically learn the language used for expressing sentiment regardless of how “good” or “normal” the language is.
But there is no magic and nothing comes for free. The machine needs some information to learn from (called a training corpus) and, in the case of sentiment analysis, this is a set of examples annotated by humans. The more examples the machine has to learn from the better – thousands of examples are better than hundreds.
Once the machine has learned the examples it can apply the acquired knowledge to new, unseen documents and classify them into sentiment categories. But this technology isn’t perfect either. The problem is domain-dependency: if a machine was trained on a corpus of movie reviews, it will be very inaccurate if applied to, say, reviews of automobiles. It means that one needs to train a machine in all domains it is to be used.
Brandwatch Sentiment Analysis
At Brandwatch, our sentiment analysis system is based on the second technique: machine learning.
When you set up a query in Brandwatch, you are asked to select an industry for the query before saving it. This is the part which addresses the hurdle faced by machine learning sentiment analysis: by selecting the industry you are telling our sentiment machine which domain or which classifier it should use. It then knows which context your results are likely to be in, and will use its knowledge from that domain to classify accordingly.
We have over 500 classifiers across all the languages we cover. We are currently working on a development which will mean the system will be able to look at your query and automatically detect which classifier/industry is most appropriate based on the terms you have used. Automating this process will ensure the sentiment categorisation is as accurate as possible.
Take a look at Taras’ guest post on Social Times Sentiment Analysis: When Machines Can Beat Humans for further discussion.