Common Deep Web and Big Data Questions Answered – Part 1

Our visitors ask us lots of challenging questions about a variety of topics relating to Big Data and the Deep Web. Some example questions include: what the Deep Web is, what data is available on it, how big is it, etc. There are some questions that we get so frequently we decided to gather those questions and resources in a series of posts to help answer them. You’ll find our quick answers to a variety of questions below with a link to a more in-depth blog posting for each question that we hope you’ll find as valuable as we do.

We’ve broken these questions up in a two part series. Part 1 focuses on questions related to the Deep Web and how we get data from it. Part 2 will focus on questions about Big Data and how we enrich and structure Big Data.

Question 1: What’s the difference between the Surface Web, Deep Web, and the Dark Web?

Short Answer:

The Surface Web is anything that can be found by a search engine, it normally consists of Web data that is linked and can be navigated to through a clickable link. The Deep Web refers to any portion of the Internet that cannot be found through a standard link crawling search engine. The vast majority of this content exists because you need to navigate through a Web search form or input a query to get there.

Dark Web refers to anything that is intentionally hidden. The most common place on the Dark Web is the TOR network or a private anonymous Internet that can only be accessed via a special browser. The Dark Web makes up a small portion of the Deep Web.

Link to Blog Post: Clearing Up Confusion – Deep Web vs. Dark Web

Question 2: How big is the Deep Web?

Short Answer:

In 2001, BrightPlanet completed a study to test the size of the Deep Web, our initial findings revealed that search engines were searching only 0.03% of pages available on the entire Internet. Since 2001, BrightPlanet has not completed any additional studies to predict the size of the Deep Web because of how large the Internet has grown. The Internet has grown so vast and so large that we now classify the Deep Web as infinite.

Link to Blog Post: How Big is the Internet?

Question 3: How does BrightPlanet harvest Deep Web content?

Short Answer:

BrightPlanet has patented technology to automate the process of directing queries into sites that have Web search forms. Our Deep Web Harvester places the queries directly into search forms at large scale and harvests the results of those queries to provide you with Deep Web content.

Link to Blog Post: What is a Deep Web Harvest?

Question 4: What’s the difference between BrightPlanet and a Google search?

Short Answer:

BrightPlanet performs harvesting of data as opposed to indexing sites. When we harvest data, we extract all of the text from each individual webpage that our harvester visits. When Google indexes data, they don’t extract all the text content. Google only stores a temporary reference of what they think is important, usually the most mentioned keywords. BrightPlanet harvests are a directed harvest allowing you to define the pool to data to be collected.

Link to Blog Post: Why Deep Web Harvesting is Different than a Google Search

Learn More

Want to learn more about the Deep Web, Dark Web, and Surface Web, download our ‘Understanding the Deep Web in 10 Minutes’ white paper.

Stay tuned for Part 2 focusing on the Big Data available on the Deep Web.

Photo: Raymond Bryson