Open Data vs. Web Content: Why the distinction?
For those who are unfamiliar with our line of work, the difference between open data vs. web content may be confusing. In fact, it’s even a question that doesn’t have a clear answer for those of us who are familiar with Deep Web data extraction.
One of the best practices we do as a company is reaching out to other companies and firms in the data community. In order to be at the top of our game, we only benefit from picking the brains of those with industry perspectives of their own.
To find out the best way to get more insight on this particular topic, our Vice President of Business Development, Tyson Johnson, had a discussion with some of the team members at Gartner. As a world-renowned research and advisory firm, Gartner has provided technological insight for businesses all around the globe.
Open Data vs. Web Content
According to his conversation with Gartner, their company perspective is that open data is information online that is readily findable and also meant to be consumed or read by a person looking for that information (i.e. a news article or blog post). Web content, conversely, is content that wasn’t necessarily meant to be consumed by individuals in the same way but is available and people likely don’t know it or how to get it (i.e. any information on the Deep Web).
In a lot of the work we do, whether or not all of this data is material a lot of people are aware of and consuming is up for debate.
For example, we’ve been issuing queries in the insurance space for commercial truck driving. This is definitely information that people are aware of, but the Deep Web data extraction that comes back isn’t necessarily easily consumed or accessed. So is it open data or web content?
It’s information that a random person surfing the Internet can find if they want to look for it. However, many aren’t aware that the Deep Web exists. They also don’t know that they have the ability to pull back even more relevant information.
So why is this distinction even being discussed? The data industry has struggled with what to call things so people can actually wrap their head around what’s out there.
The industry is realizing we need to make a distinction between most Internet users know they can consume; news articles, information on their favorite sports team, the weather of the day, etc. (open data), but they probably don’t know that there’s something called the Deep Web where they can issue queries into other websites and pull back even more information that’s relevant to what they’re looking for (web content).
Making as many people aware of the data that is available to them is at the core of the distinction and really as long as you understand the difference, we think it’s okay to call it and explain it however you want.
Web Data and How We Use It
BrightPlanet works with all types of web data. Our true strength is automating the harvesting of information that you didn’t know existed.
How this works is that you may know of ten websites that have information relevant to your challenge.
We then harvest the data we are allowed to from those sites through Deep Web data extraction. We’ll more than likely find many additional sources that will be of use to you as well.
The best part is that as our definitions of data expand, so do our capabilities.
Future Data Distinctions and Trends
It was thought that there were three levels of data we worked with: Surface Web, Deep Web, and Dark Web. According to Tyson, the industry is discovering that there may be additional levels to these categories that even go beyond open data and web content.
On top of all of this is the relatively new concept of the industrial Internet. The industrial Internet is a collection of gigabits of data generated from industrial items like jet engines and wind turbines. Tyson points out that the industrial Internet may be three times the size of the consumer Internet we’re familiar with. So when the industrial Internet becomes more mainstream will it be web content and everything on the consumer Internet be open data? We’ll have to wait and see.
These future trends put us in a good position to help tackle your challenges and find creative solutions. We harvest all types of data. If you’re curious about how BrightPlanet can help you and your business, tell us what you’re working on. We’re always more than happy to help give you insight on what our Data-as-a-Service can do for you.