In our previous post, we covered the history of search and Deep Web Harvesting learning how search started and where we are today. In this post we hope to continue that conversation and discuss:
- The current form of search
- How is it evolving to Search 2.0
- What’s in store for Search 3.0
The Current Form of Online Search (Search 1.0)
Online search in its current form focuses almost exclusively on unstructured data sets. We use the term unstructured when referring to data that a computer or machine has stored that the machine cannot easily make sense of by itself, the data has no formatting to easily identify and classify what data is actually stored.
A good way to think of this would be thinking of an Excel spreadsheet; within the spreadsheet we want to store the data from a webpage. A completely unstructured set of this data would be all of the text from the web page in one single cell in the Excel spreadsheet.
With all of the data stored in one cell in the spreadsheet, you can’t perform any advanced analytics or filtering – you can simply just search that one cell – this is the current form of online search; users performing keyword searches on completely unstructured data sets.
With Semi-structured data, you begin to have that capability to do advanced filtering. Semi-structured data in an Excel spreadsheet would have the majority of the text in the first cell, but also include additional data about the text in other cells. An example of what this would look like is shown to the right.
Finally, a completely structured data set would have every word or phrase extracted and placed into different cells, with an attribute assigned to each one of those words or phrases.
Moving Forward to Search 2.0
Search 2.0 is the ability to do search on semi-structured data. More than likely, you have already done this without ever realizing it. The most common place that Search 2.0 already happens is on online shopping sites. If you were to navigate to Amazon.com or zappos.com and search for “Men’s Shoes”, you would receive a full listing of all the men’s shoes that are for sale on the shopping site.
The results you see are keyword matches on unstructured text. However, also included in the interface is the ability to filter shoes based on their type, color, price, brand, size, shape, customer review, etc. These elements extracted and placed into the side panels of the interface are the semi-structured portions of the documents allowing you to not only find the exact shoe you are looking for, but also get a sense of what shoes the site currently has for sale. It’s this ability to filter that is considered Search 2.0.
Now imagine if this concept of Search 2.0 was applied to your traditional Google Search environment. A search into Google would still give you the results matching on the unstructured text of the webpages, but also included in the interface are filters dynamically created that allow you to further filter down upon the specific search results and get a macro sense of the results that are currently displaying.
A Search 2.0 Scenario
Imagine this potential scenario. A health researcher searching into Google wants to quickly uncover information about a topic area. When the term is search in Google, not only the results pages show, but also displayed are the following categories dynamically generated:
- People mentioned within the results
- Places in the results
A health researchers can use the filtering (similar to you buying shoes) to quickly find the exact results he or she is hoping to identify. Additionally, the researcher can also get a better understanding of the results set at a macro level very quickly. Similar to you being able to identify what color shoes is the most widely available online, the health researcher could identify the most mentioned people, companies, and places quickly identifying the subject matter experts and/or leading universities.
Imagine being able to hover over specific web results and have the parts of the pages most important to you display as they are extracted from that specific page. This extracted data gives your search some structure creating what we call a Search 2.0 Experience through semi-structured data.
The Potential of Search 3.0
Search 3.0 is the concept of doing search on completely structured data sets. The most exciting aspect about being able to search completely structured data sets is that the concept of search using keywords is completely thrown out the window. Your current search experience is very limited because it’s solely based on a standard keyword search. Often the results you get back aren’t completely accurate as the keyword you are using may have more than one meaning or you should be using a different keyword to find the results you are really looking for.
Doing searches on completely structured data sets allows you to use full documents as search criteria as opposed to a single keyword. Concepts of documents would be stored and indexed rather than just raw indexed text. Rather than having your searches start with a keyword, they could potentially start with another webpage. Now, your Google experience would be, find other web pages that talk about these same concepts on this specific website. Health researchers could identify other pages containing information similar to their research. Patent attorneys could identify any pages containing any literature relating to a patent they are currently applying for. The list for potential applications goes on and on.
Why hasn’t Google gotten us to Search 3.0? In order to get to Search 3.0 rather than storing references to specific pages in Google’s index, the actual text of each of the web pages indexed must be stored for analytics. This poses a problem for Google as they attempt to index the entire Surface Web, major scalability issues arise.
For technologies like BrightPlanet that focus on harvesting from specific relevant sources, Search 2.0 and Search 3.0 features are here today. Do you have any questions about doing better online searching today? If so, sign up for a free consultation with one of our Deep Web Investigators to discuss how you are currently searching online and what the potential might be to make it more efficient and effective.