History of Search and Deep Web Harvesting [Throwback Thursday]

Welcome to our first Throwback Thursday blog posting – in honor of Throwback Thursday, we wanted to turn back time by giving you a quick recap of the history of search and where BrightPlanet’s technology fits in with that history. In this blog posting we will cover:

How search first started and evolved
How BrightPlanet’s history fits in with search

Before you dig in, for those of you not familiar with the Deep Web, please download our whitepaper Understanding the Deep Web in 10 Minutes.

Beginning of Search

To start, let’s roll back time to the mid 90’s before the major search engines that we use today existed. BrightPlanet’s harvesting technology dates back to this time, a point when Yahoo! and AltaVista were the big search engines, the Clinton Administration was still in office, and when Yahoo! was only offering a hand-built directory of “surface” Web sites. This means that Yahoo! was manually adding web pages to a web portal rather than doing it in an automated fashion that we see today. To increase efficiency, search engines began using the technique of spidering to automate this process.

Automating of Search

Put simply, spidering (also commonly called link crawling) is the process of automatically navigating to webpages through links. Just as you would navigate away from this blog posting by clicking a link to get there, spidering involves the exact same process, but at much larger scale. This created a way for search engines to automatically find new websites and add them to their indexes; redefining the concept of search. Search engines could now offer their experience in near-real time delivering up to date and relevant content to the end user. The process of spidering or link crawling is great way for finding and indexing content quickly, however this method missed quite a bit of content creating an opportunity for technology companies to leverage that missed data, a vast opportunity that BrightPlanet acted upon by coining the term the Deep Web in 2001 and developing technology to access this missed data.

BrightPlanet realized early on that as the Internet grew, the move to publish dynamic data to the Internet through light Web interfaces and back-end databases would grow exponentially. This new data was “invisible” to crawlers or spiders used by search engines that could only index content that was linked. BrightPlanet’s original product suite involved Search Licenses implemented into the existing infrastructure of a number of U.S. intelligence Agencies. The offering of the Deep Web Harvester License required end users to store and analyze the data within their own environment, a model that didn’t work well for companies not wanting to involve IT Departments or with few analytic resources.

Search and Deep Web Harvesting Today

Fast forward to 2013, 12 years later, Deep Web has become the widely accepted term for content that search engines cannot access and spidering is still the widely used process as the way for search engines to index websites. With the exponential growth of the Internet, this has led to more data than ever before residing in the Deep Web. BrightPlanet today is still the only company that can offer that Deep Web data at scale with 8 active US patents protecting its collection technology.

BrightPlanet’s harvesting technologies have evolved from the once piecemeal Deep Web search licenses for U.S. Intelligence Agencies into new products and services, including end-to-end platform solutions for both government and commercial entities. BrightPlanet’s core harvest engine has gone through 8 major revisions over the years and has been ported from its original C++ code base into Java. An important fact is that still with the company is William Bushee, BrightPlanet’s Vice President of Technology, who has led BrightPlanet’s development team through the 8 major revisions and also served as the lead engineer throughout the technology’s inception. To learn more about the products and services, visit our product and solution page here.

The Future of Search

The questions remain however, “What is the future of search and how can the search experience be further improved?”. Tune in next week for our blog posting uncovering what potential technologies may be used to further advance your search experience.

Photo:55Laney69