Deep Web Source Repository: Stop Searching Site by Site

Our last post talked about why you should tap into the Deep Web in 2013. We now want to talk about what you can tap into on the Deep Web in 2013.

Is there some project or task in your business that requires you or one of your employees to go from website to website searching? If so, you are going to want to read this post and learn what the Source Repository is and how it can be used. In this post you’ll learn:

  • What the Source Repository is
  • How you can leverage it
  • Why it is unique

What is the Source Repository?Library

The Deep Web is at least 400-500 times the size of the Surface Web. It is continuously growing, and that means Deep Web sources to be tapped into are also growing. BrightPlanet harnesses Deep Web sources by sorting and indexing them in its Source Repository.

The Source Repository is a library of Deep Web sources/websites that BrightPlanet has collected over 10 years of executing web harvests on behalf of clients.

New sources are added and updated every day. There are currently over 85,000 Deep Web sources in BrightPlanet’s Source Repository organized into groups. Examples include Law, Healthcare, Pharmaceuticals, Social Media, Major Media, Newspapers, Finance & Economics, and Politics to name a handful of the over 60 groups.

How you can leverage the Source Repository?

End users do not need to worry about communication with sources; those processes are all done automatically by Deep Web Researchers. You just need to identify the information you are trying to find and BrightPlanet will harvest it on your behalf.

BrightPlanet commonly works with its end users to harvest content from custom Deep Web sources. End-users can define hundreds or thousands of Deep Web sources for BrightPlanet to query with many keywords at once. Once new sources are entered into the Source Repository, they will be indexed and saved for future harvests.

Here are just a few examples of how the Source Repository can be leveraged for you by BrightPlanet:


The Newspapers group in the Source Repository includes every newspaper in the U.S. In a matter of seconds, BrightPlanet could harvest topic specific content from every newspaper in the U.S. Instead of searching newspaper website after newspaper website, the information could be harvested instantly. Additionally, the papers are sorted by state so you could limit the search to certain states if it better fits your needs.


There are several categories within this group. One of those categories is Courts. This group includes sources that would allow you to search Court rulings at all levels of the Judicial Branch: State, Local, and Federal, instantly.

Finance & Markets

Buy the rumor; sell the news. Now you can find both rumors and news faster than the competition by harvesting from the News, Finance Blog/Website, Finance Message Board, and industry-specific blogs and message board source groups.

Why is the Source Repository unique to BrightPlanet?

The technology is exclusive to BrightPlanet. The Source Repository configures websites into Deep Web sources allowing BrightPlanet’s DeepHarvester to automate queries directly into the search forms of each of the sites.  Applying a query directly into the source allows the Harvester to go beyond the surface site to pull the content that can only be accessed via a query to the site’s search form. Other harvest technologies don’t have scalable technologies that allow access to web search forms without doing custom development for each source.

In addition, providing a centralized Source Repository allows for projects and teams across disciplines to work independently while still sharing all knowledge of the sources.  Sources can be configured and stored in many different source categories, allowing for re-usage of source records.   The DeepHarvester communicates directly with the Source Repository at run time to select sources within a source category and is not related directly to any one project, allowing maximum reusability.

What could we harvest from our Source Repository for you?

Deep Web sources are endless. What is that project or task in your business that requires you or one of your employees to go from website to website searching? Identify that project and sign up to schedule a free consultation call to see how BrightPlanet could help you get back valuable time you spend searching online.

Don’t think you have one of those projects, download our FREE whitepaper on the intelligence that can created from the Big Data on the Deep Web.



Photo: twechy