Why you should tap into the Deep Web in 2015

It’s no surprise to anyone that the growth and use of the Internet has continued to increase steadily. Over three billion people now have access to the Internet at home or 42% of the global population have the ability to directly access the Internet. The three billion people contribute to the content on the Internet by generating:

In this posting, we cover why you should tap into content on the Web in 2015. We also recap how users of Web data capitalized on harvested and enriched content from the largest known database in existence, the Internet, in 2014.

Tapping into the Deep Web in 2015

There are many reasons why you should start tapping into the Internet and using it as a source of data. The two main reasons, we’ll cover in this post are:

The exponential growth of the Internet
Scalable Web harvesting from disparate sources augments data you don’t currently have

Growth of the Web

The exponential growth of the Internet is our first reason why you should begin tapping into the Web in 2015. This November we released a post explaining why we now classify the Internet as infinite. The infinite classification is based on the sheer number of users contributing to content on the Web as well as the complete personalization of the Internet. By the time you’ve finished reading this article, 3.5 million new tweets will have been posted by Twitter’s 271 million active users.

The growth of the Internet has led to the need for Web harvesting and analytics that are completely scalable and allows for harvesting of hundreds of disparate sources at once.

Scalable Big Data

Our second reason that 2015 is the year to utilize data harvested from the Web is that taking advantage of data at Big Data scale is possible now more than ever.

As companies continue to understand how to leverage their own internal structured data, they’ll look for additional datasets externally to further augment their internal data and build upon their current Big Data projects. Collecting external data from the largest known database in existence, the public Internet, at large scale will allow companies to continue to leverage Big Data to improve their business.

Customers Doing Anything with Data

We realize there are endless possibilities for what you can do with Web data as long as you have a little creative thinking and the proper data set. We’ve collected Web data and helped customers use Web data in some extremely interesting ways across a wide array of industries in 2014.

Gone are the days of using Web data only for marketing and reputation management. We cover two examples of how our customers took advantage of Web data in 2014.

Tracking and Identifying Fraud

Problem:

A Fortune 100 company in a high-margin industry was hemorrhaging potential profits to overseas counterfeiters. These counterfeiters advertised brand name products at a fraction of the retail price on trade boards, fly-by-night websites, e-commerce sites, message boards, and social media.

The company’s traditional strategy included hiring an external brand protection service. This solution wasn’t scalable to the wide scope of the Internet where legitimate profits were unknowingly being siphoned off by fraudulent websites

Solution:

We implemented a scalable process to automatically monitor the Internet for any mention of the company’s brand name products. Websites, message boards, trade boards, and social media were monitored using our AuthentiWeb solution powered by our Deep Web Harvester.

Our Deep Web Harvester allows us to query directly into the search forms of websites to collect content that traditional collection technologies cannot collect. Think of an e-commerce site like Ebay. Ebay predicts that on average approximately 113 million products are listed on sale at any given point in time.

Traditional data collection technologies like site crawling will be forced to crawl through each of those 113 million listings to find and capture the correct content. Our technology allows users to select multiple queries to automate the search form to only collect the necessary content for analysis.

Websites flagged for counterfeit activity were accumulated and sent via customized weekly reports. These reports contained extracted competitor product information, online price points, contact information, online chat services, e-commerce options, WhoIs data, etc. All data was also loaded into the AuthentiWeb dashboard for the client.

Improving Insurance Efficiency Using Web Data

The insurance industry is a major user of data collected from disparate data sources. Every step in the process from underwriting to identifying fraudulent insurance claims uses data.

Problem:

Major insurance agencies spend vast amounts of resources underwriting or trying to qualify potential customers. For one major insurance agency to qualify for homeowners insurance, over 1,000 different datasets are utilized for U.S.-based customers. Datasets were stemming from anything from crime occurring in neighborhoods to credit scores and severe weather.

The sheer amount of data that was available for customers applying for insurance in the U.S. was not available to qualify customers for homeowners insurance in other countries like Portugal and Turkey. Crime data for cities in Portugal simply didn’t exist that could easily be leveraged forcing underwriters into a guessing game.

Solutions:

To help give a better understanding of what crimes were occurring in other countries to better price and qualify homeowners insurance, the insuring company turned to Web data. By harvesting content from local news sources for mentions of crime and extracting out crimes occurring at large scale, we were able to deliver data to the insurance company that allowed them to rely on additional data to qualify and price insurance policies for homeowners and deliver data that was previously believed to be unavailable.

Learn how you can tap into The Deep Web in 2015

Download our ‘Understanding the Deep Web in 10 Minutes’ white paper to learn more about the Deep Web and what it contains. You’ll learn a lot in 10 short minutes.

Photo: Luke Ma (Flickr)