Why You Should Tap into the Deep Web in 2014

Deep Web in the News

By now, you may have heard the term “Deep Web” as it has been used ubiquitously in the media. With the major break-up of the now infamous website The Silk Road that was found on the anonymous TOR network, the mainstream media has incorrectly been interchanging the TOR network and Deep Web.  As a quick refresher about the Deep Web, it can be defined as anything that can’t be accessed by a search engine that crawls links on web pages.

Web Image for Deep Web 2014 postIt can be argued the TOR network is a portion of the Deep Web, but it’s important to note the TOR Network and sites such as The Silk Road are a very small portion of the Deep Web. The vast majority of the content found on the Deep Web is content on the public Internet that you are more than likely already accessing one search at a time. Google, Bing, and Yahoo cannot access this content because you need to enter search criteria into a web form to get it.

Some Deep Web examples that you’re likely already using

1 – Travel Sites

Site: Hotwire

URL: http://www.hotwire.com

Travel sites like Hotwire and Expedia are perfect examples of where you are already finding and searching Deep Web content. You’ll never find the results of a Hotwire or Expedia search in a standard search engine, because you need to go directly to the source and enter search criteria to get to the data.

2 – Individual Government Database

Site: North Dakota State Court Record

URL: http://publicsearch.ndcourts.gov/Search.aspx?ID=100

A site like the North Dakota Court record site that is copied above is another example of a site that contains vast content in the Deep Web. The search box from the form is pointing to a database containing all the public court record information. Google, Bing, and Yahoo cannot access this data because you cannot get to the data by clicking links. You need a query in the search box, in this case a person’s name.

The Web is getting more complicated

We often get asked how big the Deep Web is and unfortunately the Internet has grown so large that this question has become impossible to even answer for the Surface Web, let alone the Deep Web. Part of the reason for this is how dynamic interactions with the Web have become.

Let’s take a look at the e-commerce site Amazon.com for instance; predicting the size of one domain is incredibly difficult. Amazon sells hundreds of millions of different items on their website, but each page that Amazon displays to you is completely customized to you based on your previous interactions with their site; your earlier searches, previous purchases, etc.

The page that you are viewing to buy that new laptop also displays the results of your previous searches, allowing Amazon to remind you what else you may be interested in purchasing. The same product page for that laptop will likely look different to you a week from today as different products are recommended and suggested for you to purchase based off your recent Amazon activity. That individual page you are currently viewing to purchase the latest Kindle Fire then counts as one unique page within Amazon, a unique page that no one else will ever see because it was made just for you.

This complexity and the dynamic nature of the Internet, along with the sheer size of the number of domains or unique websites (most recent estimates predict 148 million) have made the total size of the Internet nearly impossible to predict and classify both on the Surface Web and in the Deep Web.

Who is using enriched Surface Web and Deep Web content?

You may be wondering what industries are currently tapping into content from the Surface Web and Deep Web and seeing success? At BrightPlanet we specialize in offering harvesting and data services to help end users collect and analyze Deep Web content at Big Data scale. We not only harvest the content, but also enrich and tag it to give you or you customers an output that becomes usable.

See below to get short case studies on industries and organizations benefiting from tapping into data on the Deep Web.

Harvesting Data to fight Anti-Counterfeiting for the Pharma Community
Who is Powered by BrightPlanet?

The Pharmaceutical Community: The pharmaceutical community is estimated to be losing $200 billion per year due to the sale of fraudulent and counterfeit drugs online.  Not only are losses in revenue occurring, but also customer safety is in jeopardy. The pharma community has become Powered by BrightPlanet to help combat the sales of these drugs to increase customer safety and decrease profit loss for the industry.

What data do they want?

Online Pharmacy Data: BrightPlanet harvests content from all known online pharmacies which sell fraudulent pharmaceutical products. BrightPlanet harvests the content from these sites and then extracts over 45 entities or indicators to identify which sites are bigger threats for online pharmacy fraud and how the sites may be related.

Why Powered by BrightPlanet?

Reduce Overhead: Major pharma utilizes the enriched data sets from the online pharmacies to target hubs of online pharmacies selling drugs online. Major pharma can shutdown hundreds or thousands of domains at a time rather than targeting one individual site by using the data BrightPlanet provides.

Uncovering hiring trends of Fortune 1000 companies
Who is Powered by BrightPlanet?

 An HR Staffing Company: A human resource strategy consulting company that assists Fortune 1000 customers in globalizing their business and improving their people strategy is currently Powered by BrightPlanet.

What data do they want?

Fortune 1000 Job Postings: Being in the business of staffing and human resources, the Fortune 1000 consulting organization wanted a way to not only automatically harvest and monitor job postings, but to be able to quickly make sense of the data. BrightPlanet, not only harvested job postings by Fortune 1000 companies, but also then extracted the title, location, certifications, and key qualifications.

Why Powered by BrightPlanet?

Offer New Services and Increase Market Share: The HR company was able to take the harvested and enriched data, integrate it directly and easily into their current infrastructure, and allow their current customers to interact and analyze the datasets in the dashboard their customers were already using.

Managing the fast changing environment of Electronic Health Record Implementation
Who is Powered by BrightPlanet?

The Rockville Institute: A nonprofit that provides resources for health professionals wanted to create a platform to provide health professionals a way to collaborate and solve real-world problems related to planning, implementing, and optimizing electronic health records (EHR). The business of healthcare IT is constantly changing as it relates to EHR.  Content about EHR implementation that can be found online is almost impossible to manage and stay on top of manually.

What data do they want?

Up-to-date curated information relating to EHR Implementation:  BrightPlanet harvested and provided up-to-date articles about EHR implementation from the news, professional organization sites, government sites, and journals.

Why Powered by BrightPlanet?

Offer New Services: BrightPlanet’s data was harvested and built directly into the HealthITXChange site allowing customers the opportunity to interact with up-to-date collected, trusted information all in one place.

How to Start Tapping into the Deep Web

Want to learn more about how you can become Powered by BrightPlanet, request a free demo today.

Not quite sure about the Deep Web yet? Download our free, Understanding the Deep Web in 10 Minutes whitepaper.



Photo: Roxnstix