Infinite Possibilities: Harvesting and Enriching Public Opinion Data from the Web
The Internet is infinite.
It’s difficult to grasp the sheer amount of knowledge available online. Which means there’s an endless amount of information waiting for you to harvest and use for your business.
But how do you make sense of the unlimited stream of information online? How do you even know where to start and what to consider useful?
BrightPlanet, along with our technology partner Rosoka, can provide the answers. In this post, we discuss the harvesting and enriching of online data that can then be visualized and/or analyzed which we’ll be writing about in a later post.
Endless Amounts of Data
The Web today is not finite because it has a beginning and doesn’t have an end. We can’t find an end to the Internet.
You may be wondering how this happens.
Consider Twitter. It estimates an average 5,700 Tweets every second. That adds up to about 500 million in a day, with each one generating an individual URL for the Twitter domain alone. And Twitter is one of an estimated 400 million active domains online today.
This is still hard to wrap your head around — those are numbers we can’t even visualize — but it gives you some understanding behind the infinite nature of the Web.
Internet Use on a Grand Scale
Now that you know where all of this data comes from, what do you do with it? That’s where our team at BrightPlanet comes in.
BrightPlanet’s harvesting technology collects content and data that’s very similar to how you accessed the Web today. You might find new links to news stories and articles of interest. Or do searches for your next family vacation. You subscribe to RSS feeds of your favorite blogs and monitor what people are doing on social media sites.
We’re doing the same things, but on a much, much larger scale.
When we harvest data from a webpage — whether it’s 1 or 10,000 — we collect all of the text-based content, normalize it and store it. This way we’ve got an original record of the page regardless of whether it changes or is removed.
Our harvesting, though, collects this data in an unstructured format so we need to then organize it a.k.a enrich it.
Organizing Harvested Data
To give the data more structure, we enrich it by doing entity extraction. Entity extraction is the process of identifying key terms within the data that have value. A customer specifies which key terms are most relevant (people, places, brand names, etc.) and we tag these key terms and their relationships to each other in the data at large scale.
If someone wanted to enrich the data themselves, they would have to read every word and highlight the key terms; an incredibly manual and arduous process. The harvesting and enrichment prepares the data for analysis and visualization which we’ll talk about in our next post.
Harvesting and Enriching in Action
For a recent webinar, we collected data to gauge public opinion related to each of the candidates in the 2016 presidential campaign. We harvested data from over 9,000 global news sources and Twitter in mere minutes.
We then enriched the data by tagging key terms (extracting entities) like candidate names and key campaign issues. This gave us the structured data we needed to move to the next step, analysis. In our next post, we’ll talk about how we took this enriched data and analyzed and visualized it to gauge public opinion.
Find Your Data Solution
Yes, there’s an infinite amount of data online, but whatever type of data you need, BrightPlanet can harvest it.
From there, you can perform the relevant analysis to solve problems or answer questions about your business.
Are you interested in learning how your business can benefit from BrightPlanet’s data harvesting services? Contact our team for a free demo and we’ll show you how to harness the endless power of the Internet.