Earlier this week, Forbes released an article titled “Insider Trading on the Dark Web”. BrightPlanet was mentioned within the article, and BrightPlanet was introduced as a company that collects content from what is called the Dark Web. While we appreciate being mentioned in Forbes, there are a few definitions we want to clear up for readers.
With recent emphasis being on the Silk Road shutdown by the media, we’ve found a significant misunderstanding of the terms Surface Web, Deep Web, and Dark Web. We hope that this posting can help guide you through these often confused terms and get a better understanding of how the web works. You’ll understand that Forbes’ definition of Dark Web content was indeed inaccurate. Let’s get started.
Starting with the Surface
To start on our journey of the different aspects of the web, we’ll begin with the surface; the parts you’re most familiar with. The Surface Web is anything that can be indexed by a typical search engine like Google, Bing or Yahoo. Google has a great interactive story explaining how they index and search the web in depth.
To help you understand how search engines work, I want you to open a traditional news or blog site (CNN, BBC, etc.) and begin clicking different links to new article pages. Once you have finished doing that, come back to the blog posting.
If you’re done clicking links, you’ve just behaved how search engines’ crawling technology finds and identifies websites. Search engines rely on pages that contain links to find and identify content. You’ll find that this is a great way for finding new content on the web that most of the people generally care about (blogs, news, etc.). But this technique of navigating links also misses a lot of content. Let’s go a little deeper to find out exactly what type of content is missed.
Moving a Little Deeper
From a purist’s definition standpoint, the Surface Web is anything that a search engine can find while the Deep Web is anything that a search engine can’t find. The Forbes article that we mentioned previously used BrightPlanet’s definition for the Deep Web as the definition for the Dark Web. There are a number of reasons that a search engine can’t find data on the web, today we plan on covering the most common one.
Remember how we had you open up a web page and crawl links? Now I want you to stop and open up a different web page, let’s use the travel site Hotwire this time. I have a challenge for you – I want you to attempt to find the price of a hotel in Sioux Falls, S.D. (BrightPlanet’s headquarters) from April 10 to 12 (Sioux Falls is still cold in April). But wait, there’s a catch, you can only interact with the site like a standard search engine would – meaning, you can only click links to get there.
There’s a nice search box that Hotwire allows users to fill out, but you can’t use it. Search engines don’t use search boxes, they just use links. You’ll quickly find that you can’t find the search results you are looking for without a search box. The results of a Hotwire search are perfect examples of Deep Web content.
Other examples of Deep Web content can be found almost anytime you navigate away from Google and do a search directly in a website – government databases and libraries contain huge amounts of Deep Web data. Here’s a few other examples:
Google search can’t find the pages behind these website search boxes. Most of the content located in the Deep Web exists in these websites that require a search and is not illicit and scary like the media portrays. However, if you go a little deeper in the Internet you’ll find the Dark Web.
Getting a Little Darker
Continuing with our definitions, we’ve learned that the Surface Web is anything that a search engine can access and the Deep Web is anything that a search engine can’t access. The Dark Web then is classified as a small portion of the Deep Web that has been intentionally hidden and is inaccessible through standard web browsers.
The most famous content that resides on the Dark Web is found in the TOR network. The TOR network is an anonymous network that can only be accessed with a special web browser, called the TOR browser. This is the portion of the Internet most widely known for illicit activities because of the anonymity associated with the TOR network.
The key thing to keep in mind is the Dark Web is a small portion of the Deep Web. Some media is inaccurately defining both and we want to do our best to clear up the confusion.
Want to learn more about the Deep Web? Download our whitepaper on Understanding the Deep Web in 10 Minutes which includes some of the information you just read and builds on it.
At BrightPlanet, we help customers find the data they want on the Deep Web, harvest it and make it usable. The buzzword Big Data is permeating every industry and we provide data-as-a-service to help organizations harness and use Big Data from the web.
Learn more about our Data-as-a-Service here.
Featured Image: Manveet Singh