DeepHarvester ™ Workbench
License a Workbench for Hands-on Control of your Deep Web Harvesting.
BrightPlanet’s DeepHarvester™ Workbench provides the most comprehensive harvesting and content normalization system on the market today - at the scale of the internet. BrightPlanet offers the DeepHarvester Workbench for those who need to be in control of their own harvesting needs within their own infrastructure.
The Workbench can be used as a standalone lightweight Web user interface or can be tightly integrated into a custom or enterprise solution through the OpenPlanet™ Dashboard.
Download our marketing materials for DeepHarvester™ Workbench
FEATURES
BrightPlanet has developed a patented, heuristic, rule-based expert system for automatically communicating with Deep Web sources that does not require one-off scripts to be built by hand. The DeepHarvester Workbench is wrapped around BrightPlanet’s flagship DeepHarvester™ Platform, which has been developed and refined over the past 10 years. The DeepHarvester features include:
- Harvesting from Deep Web sites that require the use of a query
- Integrating with internally Deep Web sources
- Leveraging existing surface web search engines
- Harvesting using traditional crawl or surface web techniques
- Harvesting links through RSS feeds
-
Supporting standard crawler features:
- Optionally honoring Robots.txt rules
- Customizable user-agent tags
- Timeout and redirect limit settings
- Support for session cookies
- Integrating with internal Deep Web sources
- Scripting custom source option
- Harvesting inline content options:images, CSS and JavaScript files
- Supporting proxy servers: anonymization through 3rd party solutions
- Providing a multi-thread harvest engine built on a distributed platform
- Accessing the OpenPlanet™ Platformcustom normalization, analytics and storage
- Working with BrightPlanet’s Deep Web Source Repository
- Harvesting and profile management: Java, Web Services or RMI API
BrightPlanet has provided three-year licenses of the Deep Web Harvester to U.S. Government agencies, allowing them to harvest behind their firewalls. While BrightPlanet highly recommends using its Content Navigators, our experienced personnel, to navigate and harvest the Deep Web as well as optimize Deep Web Content Silos, it is now pleased to make the DeepHarvester Workbench option available to the commercial market.


