Interview: Carnegie Mellon Professor Ari Lightman On How Students Are Empowered By Learning To Use Brandwatch Consumer Research
By Kara FinnertyJun 10
Social Media Monitoring is bit like finding a needle in a haystack, or lots of needles but in a big haystack. For most brands there are dozens or hundreds of interesting mentions every day, but the problem is that they are hidden among thousands of less interesting or completely irrelevant mentions.
Defining the query properly, with Boolean words like AND, NOT, OR etc., can reduce this problem so that it feels more like looking for needles in a mere handful of hay (apologies for stretching this metaphor uncomfortably far).Looking for a needle? (Courtesy of sibotk)
Why do I need a complicated query, simple works with google?
Simple is good, and wherever possible I write a simple query (see Golden rule no.1 , below). That said a simple search on Google often gives the user thousands of irrelevant results, only we mostly don’t notice as the Google ranking algorithm means the top 10 results, i.e., the ones we see, are usually good enough.
So, for example, I might be trying to find reviews of the best apple juice and so I type “Best Apple Juice” in Google (without the quotation marks – I’ve added those to fit with editorial convention). I have not bothered to use any sophisticated Boolean terms like AND, NOT, etc. Google then tells me it has found over 53m web pages and shows me links to the 10 most relevant ones. Several of these 10 are interesting.
But what about the other 53 million sites: are they all about the best apple juice? Well for starters, the 50 million is dubious (see Giles’ blog about this here), but if I were to look more closely at some of the search results after the first page, I would find that Google has included pages that are about just Apples (e.g., recipes with apples, tips on growing apples, etc), about all sorts of best thingsandabout all sorts of juice. These don’t appear on the first page because Google is clever enough to rank the pages that include all 3 terms before any pages that include only 1 or 2 of the main terms (‘best’, ‘apple’ and ‘juice’).
Because social media monitoring is about looking at all the relevant pages about a brand or topic, we have to be more precise and that means writing a more complicated query, involving ANDs, ORs, NOTs and other Boolean Operators.
In the context of search queries, Boolean operators are words that provide a computer with some criteria to use when searching web pages. Words like AND, NOT, OR are all Boolean operators. You may have studied Boolean logic when you were a young teenager (or younger, if you had an expensive education) when you looked at sets, subsets and Venn diagrams. There are some excellent introductions on the web to boolean text. Rather than repeat that material here, let me tell you what Boolean operators you can use with Brandwatch and how they work.
Example of Venn diagram (courtesy of jimmiehomeschoolmom)
Using Boolean operators in queries for social media monitoring
Brandwatch’s social media monitoring tool allows subscribers to use 10 operators. Many of them can be used together (I note where they can’t be). The 10 operators are:
1. “” double quotes
Double quotes find web pages where the text in the quotation marks appear in that order without any other words in the middle. So for example, searching for “apple juice” (keep the quotation marks) will find a site that says: ‘I love apple juice’ and ignore sites that mention just apples or just juice. Bear in mind that it can be too restrictive some times, e.g., “apple juice” will not find a web page that says: ‘I like Apple and Orange Juice’
2. AND (also can use +)
AND requires the web page to contain both sets of terms that AND refers to, e.g., Apple AND Juice will find web sites that mention both Apple and Juice. A web page that says ‘I love Apple and Orange Juice’ will be found with this query. Remember that AND has to be capitalised.
OR requires the web page to contain either of two terms, e.g., Apple OR Juice will find web sites that mention either Apple or Juice. A web page that says ‘I just went shopping; bought some apples, bananas, etc” will be found with this query. When using OR, remember it has to be capitalised.
4. NOT (also can use -)
NOT requires the web page to not include a term, e.g., “Apple Juice” NOT bubblegum will find web sites that mention either “Apple Juice” and that do not mention bubblegum. A web page that says ‘I love apple juice flavoured bubblegum ‘ will be excluded from the results of this query. As with AND and OR, NOT has to be capitalised.
5. () i.e., brackets or parentheses
Parentheses are used to group terms together, so that operators like AND and NOT can be applied to all the terms in the brackets, e.g., “Apple Juice” NOT (bubblegum OR “bubble gum” OR sweets) will find web sites that mention Apple Juice and then exclude those that contain either spelling of bubblegum or that contain sweets. So web sites with phrases like “apple juice flavoured bubblegum” or “apple juice flavoured sweets” will all be excluded.
You can build quite complicated queries by using multiple parentheses; we will show these in future posts.
6. ~ the Tilde or the squiggle or the proximity operator
The “~” symbol (called a Tilde) is used to find web pages where two (or more) words appear within a few words of each other. For this reason it is called the proximity operator. The proximity operator is always used in conjunction with quotation marks and with a number, which indicates the maximum number of words apart they must be. So, for example, “Apple Juice”~5 will find all web pages that contain both Apple and Juice within 5 words of each other. A web page with the phrase “the new apple and cranberry juice is on sale now” will be included using this query; whereas a web site with this quote will not be: “I like the golden delicious apple best, they are firm, crispy and not too much juice” (as juice and apple are 9 words apart).
This operator allows the user to either restrict a search to one (or more) sites, or to exclude one (or more) sites from a query. “Apple Juice” AND site:twitter will provide all the mentions of Apple Juice on twitter, similarly, “Apple Juice” NOT site:twitter can be used to find any mention of Apple Juice, except on Twitter. “site:” will recognise any component of a url so you don’t need to write out the entire url; component in this case means anything that lies between two periods (full stops) or hyphens (“-“). So in the above example you don’t need to write the whole url (it is okay if you do), ‘twitter’ will do.
8. raw: and notes on Punctuation
When we search web pages we ignore all punctuation and remove non-alphanumeric characters (e.g., exclamation marks, plus symbols, umlauts from German etc) and treat the text as all lower case. The “raw:” operator reinserts the need for case sensitive and for special characters
So a query for “apple juice!” will actually only look for “apple juice” as the exclamation mark is striped out. Similarly a search for “Sky+” (a UK satellite receiver box) will actually search for “sky”. raw avoids this, so would include the exclamation mark in “apple juice!” and both the capital “S” and “+”for “Sky+”.
“raw:” should be lower case and it needs to be followed by a colon and no space; e.g., raw:”apple juice!”.
Title works much like ‘site:’, as it finds (or excludes) web pages where a search term is in the title of the article. So the query title:”Apple Juice” will only find sites with the Apple Juice in the title of the article; e.g., a blog with the title “My reviews of 10 supermarket’s Apple Juice”
“title:” itself should be lower case and followed by a colon and no space.
location: is our newest operator and will go live on the site soon (October). Using it in a query will include only web pages that are from that country. The query “Apple Juice” AND location:uk will provide web pages that feature the term “Apple Juice” and where the author of the post or the website itself are registered as being in the UK. Note: many webpages do not indicate location and so default to the location of the web site. In many cases, this default is the USA, even though the blog post may be written by, say, an Australian resident. Twitter and facebook (and some other sites) are special cases; these sites publish the users-location based on either profiles or IP addresses and for these sites we use this information to determine location.
Avoiding common mistakes
The three most common mistakes in writing a query are:
a) Not capitalising ‘NOT’, ‘AND’, ‘OR’, etc.
b) Forgetting to use one or both quotation marks for phrases, e.g., “Apple Juice” OR Apple Juce should be “Apple Juice” OR “Apple Juce”
c) Forgetting to close brackets
Trying to find the mistake? (courtesy of albany_tim)
Finally, some golden rules for using these operators
Rule 1: Simple is best. If the simple query works then don’t try to over complicate it with ANDs, NOTs, ORs, etc. Some brands have unique names that don’t get used elsewhere. So in the automotive word, the (VW) Touareg is a pretty unique name and so won’t need to be complicated query; the (Ford) Focus is not unique and so would have lots of exclusions and or context words.
Rule 2: 80:20 rule: You can usually get to 8 out of 10 relevant mentions with relatively little work (20% of the effort) but getting a perfect query is hard to achieve (80% of the work) and may not be worth it. I have spent a lot of time trying to remove irrelevant types of sites; at first this is easy, but then I find a few persistent irrelevant web pages and getting rid of these is either risky or time consuming. It can be risky as I start to remove useful relevant web pages, and it can be time consuming if I have 1,000 mentions a month and have to add a new NOT term for each of 200 irrelevant web pages.
Rule 3: In 95% of cases AND, NOT, OR and quotation marks (“”) are the best operators to use.
Rule 4: Even when the query is complicated, it is usually best to have a simple structure made up of 3 parts:
[Main term] AND [context terms] NOT [excluded terms]
E.g., if we were looking for problems with the VW Passat, I might write:
“Passat” AND (repair OR problems) NOT (“spare parts” OR site:www.autotrader.com)
Want more guidance?
We will roll out more posts, including examples and videos, on creating queries over the next few weeks, and will tweet these when they come out.
Note on images: all images licensed under creative common.
Get access to free expert-built dashboards, daily email bulletins, unique reports and more to help you understand how consumer are responding.