Deep Web University
Discover over 200 articles we’ve written about the Deep Web in our Deep Web University.
Illegal online pharmaceutical sales contribute to billions of dollars in lost revenues to pharmaceutical companies each year. More damaging than lost revenues is the potential for lost lives from consumers taking counterfeit and fraudulent drugs ordered online.
In this walk-through, we uncover how harvesting technology can be used to harvest, curate, and then develop insights into the world of online pharmaceutical sales.
The first step in tackling the sale of online pharmacies was creating a system that can identify and detect websites that sell drugs online. To do this, we asked ourselves how current people are finding the websites and making purchases. We used the answers to those questions to automate the process of finding new domains. Some of the techniques we implemented were:
In addition to this reactive approach which found established domains, we also monitored domain registration lists of all .COM domains that were newly registered in real-time. This meant we were identifying domains, even before customers were visiting the sites. Approximately 100,000 domains are registered on a daily basis and BrightPlanet monitored those domain registrations based on indicators in the domain name such as RX and drug names (FreeViagra.com, CanadaRX.com, etc.).
The Drug Name, Dosage, and Dollar Amount were extracted to understand which drugs are being targeted.
Payment processors and purchase types are extracted to help give insight into the legitimacy of the online pharmacy.
Contact information such as E-mail addresses and phone numbers were tagged to help connect pharmacies to eachother.
External links were analyzed to create networks of Online Pharmacies that may be linking to identical sites.
Once websites were identified, we further curated the harvested web pages to eventually help develop insights. This involves collecting all the text from each of the pages within the identified domains and identifying important terms on each page. We needed to identify key terms that were going to help us accomplish our goal of first uncovering which drugs were being targeted the most and second who were the biggest offenders. With those goals in mind, we extracted the following entities:
In addition to the data harvested from the online pharmacy pages, we also paired a few external datasets for further analysis.
Once the data was curated, we could begin developing insights from the data. We offered a few different options for drug manufacturers to ingest and analyze the harvested data.
Use data visualization to quickly uncover drugs being sold online, filter in to see where those drugs are sold.
Identify insight into online pharma hubs and which domains share connections based off of phone numbers harvested from domains.
Identify insight into online pharma hubs and which domains share connections based off of email addresses harvested from domains.
Identify where products are being targeted by pairing harvested with WHOIS lookup information.
Learn how you can take advantage of web data and begin developing insights for your own business.
Schedule a free consultation with a BrightPlanet® Data Acquisition Engineer today.
Discover over 200 articles we’ve written about the Deep Web in our Deep Web University.
See how real businesses are using BrightPlanet’s technology to develop their own insights.
Schedule a free consultation with a BrightPlanet Data Acquisition Engineer today.