Upcoming webinar: How to get value from Reddit data

Instant Registration

Upcoming webinar: How to get value from Reddit data

Instant Registration
Marketing

Published October 16th 2013

A Guide To Eliminating Spam in Your Data: 5 Top Tips

Nobody likes or needs spam. Nevertheless, you can find it everywhere: websites, social media platforms, emails, forums, and so on.

There’s a constant struggle to combat spammers and bots, especially in the social media monitoring industry, where clean data is essential when it comes to getting relevant mentions of the brand or industry you are interested in.

Spam-free data allows a better breakdown and analysis of your mentions, leading to much clearer and more pertinent conclusions regarding the online image of your business, for instance.

How does Brandwatch get rid of spam?

Several measures are in place to prevent spam from appearing in your data.

Long story short, we develop complicated algorithms to look at the frequency of words and other indicators of spam, as well as conducting industry specific searches that target common spam phrases, such as “best deal” or “replica handbag”.

Additionally, we have created blacklists of known bad sites based on customer feedback and analyst reviews, and we also run keyword-density checks to detect SEO text.

Ultimately, the goal is to keep spam to a minimum level while continuously increasing the number of instances we crawl on a daily basis.

5 tips to fight spam

Brandwatch is great at removing spam, but it’s always recommended to go the extra mile and make sure there are no junky mentions among your results:

 

1) When testing a Query, always check the last page of mentions and see how relevant your results are; as long as they’re still related to the topic/keyword/brand you’re investigating, you’re on the right track

 

2) Create exclusion strings to add to your Query that filter out sneaky spam. The example below shows some of the most common spam-related words, that you should remember to exclude from all your Queries (unless of course they are relevant to your brand!)
spammy words

3) Once you’re happy with your Query and you’ve created a dashboard, take a look at top authors, top sites and main topics of discussion. Search for any junk indicators, such as spammy Twitter accounts or suspicious forums and exclude them from your Query. If you can’t find any, your data is certainly ready to be chopped and sliced

 

4) Create your own blacklist with spammy words and websites within the industry you’re researching and exclude them from your Query – it will help you save time and keep your data clean

 

5) Report any spammy mentions to the team by clicking on “report mention as spam” and we’ll make sure they’re being dealt with

report mention as spam

You can find more top tips for getting clean data here.

Share this post
Categories
Guide Monitoring
Search the blog
React Newsletter

Sign-up to receive the latest insights into online trends.

Sign up
Plot Curve
Brandwatch Analytics

Brandwatch Analytics is the world-leading social listening platform

The most powerful and responsive social media listening and analytics platform available.

Learn More
Airline Analysis DashboardOverviewKey InsightsLast 7 MonthsLast 7 MonthsLast 7 MonthsShare of VoiceTopicsSentimentDemographicsProjectAirlinesAditi@analyst.comDashboardsDataToolsReportsAlerts5kSepOctNovDecJanFebMarApr10k15k20k05k10k15k20k0Historical ComparisonGoQuick Search20%35%5%10%30%SepOctNovDecJanFebMarMention VolumeMention VolumeJet AirwaysFly FirstRoyal AirwaysAir AtlanticPacific AirlinesJet AirwaysFly FirstRoyal AirwaysAir AtlanticPacific AirlinesJet AirwaysFly FirstRoyal AirwaysAir AtlanticPacific Airlines

Crimson Hexagon has merged with Brandwatch. You’re in the right place!

From May 8th, all Crimson Hexagon products are now on the Brandwatch website. You’ll find them under ‘Products’ in the navigation. If you’re an existing customer and you want to know more, your account manager will be happy to help.