How Web Data Harvesting Can Be Used to Combat Counterfeiting

The World Trademark Review recently published a startling commentary on a study in an article titled,  “We are failing”: study reveals $461 billion international trade in counterfeit and pirated goods. The article details the failings of companies when it comes to combatting counterfeiting online. In this post, we hope to cover how harvesting web data can help companies stop “failing”.

The Counterfeit Problem is Out of Control

The study referenced in the article “revealed that counterfeit and pirated goods represented up to 2.5% of world trade in 2013 – a figure that Antonio Campinos, president of the EU Intellectual Property Office (EUIPO), noted ’is equivalent to combined GDP of the Czech Republic and Ireland’.”

The article includes several other key findings but the one that caught our eye was “almost 20% of the total value of seized products refers to IP rights of holders registered in the United States (followed by Italy (14.6%), France (12.1%), Switzerland (11.7%), Japan (8.2%) and Germany (7.5%)).”

Intellectual property (IP) is a critical asset of companies and it is constantly being attacked and exploited online. Many companies have internal departments and programs to try to find and shutdown these online offenders but most struggle because of the infinite size of the internet. This is where data harvesting can help.

Stop Failing with Data Harvesting

Fraudulent trademark usage, products, and brands contribute to billions in lost revenue for legitimate companies who own the intellectual property as the study outlines. The businesses who own the intellectual property rights can leverage web data to identify and stop fraudulent usage of their property with data harvesting.

How It Works

Many customers in the intellectual property management space don’t initially have a complete grasp on where their products are being sold online. The first step to getting a data harvesting initiative going with BrightPlanet is source identification.

During source identification, Data Acquisition Engineers (DAEs) use several proprietary methods to identify both established websites and new domains as potential targets. The DAE team then puts the targeted websites through the harvest and curate steps.

The harvest step takes the targeted websites and automatically harvests the full text and other helpful data from each web page. To construct insight from this data, it is then curated.

Curation organizes and identifies key information in the data that is outlined by the customer including: product names, sale price, quantity, email addresses, phone numbers, names, places, Bitcoin addresses, etc.

The curated data is delivered to the customer as they would like to receive it. Some customers receive the data at the programmable level through BrightPlanet’s REST API, while others feed the data into a customizable data visualization dashboard.

Regardless of how the customer receives and analyzes the data, data harvesting helps identify:

  • Which products are targeted the most and by whom?
  • What fraudulent websites selling the product have in common?
  • What information is being added and removed from pages selling the products?
  • What price are the products being sold at in different regions of the world to help measure the loss associated with and size of the problem?
  • What is the preferred transaction method used for sales of the counterfeit product (payment processors, bitcoin, etc.)?
  • What countries host these sites to map against potential legal actions based on jurisdiction and severity of the counterfeit sales?

How to Get Started

If you are interested in adding data harvesting to your intellectual property protection, sign up to schedule a free consultation. We are experts in combining traditional risk management practices with the use of web intelligence.


Meet With Us