Big Data Mining: Harvesting the Deep Web

Tracking online activity is a difficult business. People move more and more of their lives to the world wide web, and there is thus a wealth of information out there that people have exposed, whether intentionally or unintentionally. With this comes all new methods of tracking down wrongdoing–every day, people use online mediums to communicate about or coordinate illegal activities. But the internet is a big place, and tracking down these cases–performing the necessary Big Data Mining–is not so simple as just typing a few keywords into Google or another search engine.

Mining the Data

There are a variety of traditional analytics platforms that claim to be the perfect product for intelligence gathering. These analytics utilities comb the web for you, doing the equivalent of hundreds to thousands of individual searches. Certainly, these tools can be helpful, and are a superior choice to hiring someone to monitor all of the searches manually. This task would be never-ending and there would still be valuable data left untouched.

The Deep Web

This type of untouched data lies below the surface of what traditional analytic platforms and search engines can access. In fact, there’s an entire unlisted stretch of the internet known as the Deep Web. Think of it as the Marianas Trench of the world wide web. And this Deep Web remains unexplored by virtually every tool out there. Big Data Mining, however, allows access to all of this deep web information previously inaccessible from posts, to tweets, to RSS feeds, and doesn’t leave out “simple” things like surface sites either. An ideal data mining tool will serve as your complete search and intelligence gathering tool and delves into the Deep Web to retrieve the information you need.



Photo Courtesy of Explorer Bjorn