BrightPlanet
  • Home
  • About
  • Services
  • API
  • Case Studies
  • Deep Web University Blog
  • Contact Us
  • Search
  • Menu

Our Technology

Learn more about our harvesting technology.

If you’re hoping to learn more about our harvesting technology, you’re in the right place. In this walk-through of our harvesting process, we hope to answer some of the most common questions about our harvesting technology and take you behind the scenes of our process. The following content is fairly technical, if you want a higher level overview of our process, reference our Data-as-a-Service page.

Harvest

We’ve developed a patented Deep Web Harvester that allows us to harvest data from web pages using pre-existing harvest types. BrightPlanet’s Deep Web Harvester contains the following harvests. All of BrightPlanet’s harvests are highly scalable and allow for the harvesting and collection of thousands of web pages.

  • Site Harvests use the technique of link crawling to harvest data. Starting URLs are identified and the harvester crawls based off of links on the web page.
  • Deep Web Harvests automate queries directly into web forms to find and harvest Deep Web data.
  • RSS Harvests ingest results from RSS feeds to help you stay on top of new content published to news and blog sites.
  • E-mail Harvests automate the harvesting of text content from e-mails in an e-mail account the harvester has access to.
  • Twitter Harvests utilizing the Twitter Search API to harvest tweet data.
  • Facebook Harvests operate using the Facebook Open Graph API.
  • Scripted Harvests provide more control over harvests through a scripted language to collect data when websites require custom interactions, such as logging in with a username and password.

Curate

Once the Deep Web Harvester identifies the page to harvest, the curation process begins. The curation process involves structuring the data and converting it from a web page. The process begins with extracting all the text from the page and placing it into a completely unstructured format.

Entity ExtractionThe data is now ready to structure. We structure data using a process called entity extraction. Entity extraction involves identifying key terms within a web page. These typically involve the names of people, companies, and places. Extraction uses a rule-based engine, which means we can also customize entities for each customer to tag the names of your products, and other terms important to you.

Develop Insights

Now that the data is harvested and curated, it’s ready to then begin asking the data questions. BrightPlanet customers typically interact with harvested data one of three ways.

  1. Search Dashboard – BrightPlanet’s Search Dashboard gives your team direct access to a full text, searchable repository of all your data harvested in one place.
  2. API Access – BrightPlanet’s REST API gives you direct access to data to ingest the data for your own use and analytics internally.
  3. Custom Data Visualization – BrightPlanet uses open sources and proprietary data visualization tools to help better analyze and develop insights from the data.

Videos

Explore the Deep Web and BrightPlanet’s processes with Data Acquisition Engineer Jamie Martin as your guide.

What Is the Deep Web?

Why We Use the Term Harvest When Talking About Web Data Collection

How We Determine Sources for Our Clients

BrightPlanet’s Curation Process Explained

Services Menu

Data-as-a-Service

Global News Data Feed

All Services

Case Studies

See how other companies are taking advantage of BrightPlanet’s Data-as-a-Service for their business.

View Case Studies

Get Started with a Deep Review

Ready to get started? Our Deep Review is a great place to start. You’ll get direct access to our engineering and consulting team in a 6 week funded proof of concept using your actual data.

Get Started With A Deep Review

Schedule a Consultation

Schedule a free consultation with a BrightPlanet® Data Acquisition Engineer today.

Schedule Now

Deep Web University

Discover over 200 articles we’ve written about the Deep Web in our Deep Web University.

Read Our Blog

Case Studies

See how real businesses are using BrightPlanet’s technology to develop their own insights.

View Case Studies

Schedule a Consultation

Schedule a free consultation with a BrightPlanet Data Acquisition Engineer today.

Schedule a Consultation

  • Become a Partner
  • Privacy Policy
Copyright © 2001-2019 BrightPlanet® Corporation. All Rights Reserved.
Scroll to top