We crawl data from over 80 million sources (and growing) in 44 languages. We collect and store our own data, meaning we’re always in control and can add sources on demand.
However, quantity isn’t always everything. We know that that data is useless if it’s irrelevant. That’s why we focus on having industry-leading spam and duplicate detection, as well as advanced Query functionality, so you’re sure to get clean, accurate, quality data.
We have developed complicated algorithms to look at the frequency of words and other indicators of spam. We also conduct industry specific searches that target common spam phrases, such as ‘best deal’ or ‘replica handbag’.
Additionally, we have created blacklists of known bad sites based on customer feedback and analyst reviews, and run keyword-density checks to detect SEO text.
Our goal is to keep spam to a minimum while continuously increasing the number of instances we crawl on a daily basis.