API

The BrightPlanet Document API is part of BrightPlanet’s REST API. It allows queries against the curated data feeds provided by our Data-as-a-Service platform and behaves similarly to the search feature available in BrightPlanet’s Search Dashboard.

Before digging in, it’s important to know that the Document API is focused on content within the data feed. This means that results will only contain content that is already harvested from your subscribed data feeds. This quick start guide will show you how to find an appropriate data feed and then begin requesting content from that data feed. It’s also important to note that BrightPlanet’s document API can be found and tested in your browser here: https://api.brightplanet.com/.

Quick Start Guide

The API is built around the docs/search call which allows users to request data from BrightPlanet’s harvested documents based off of a highly flexible query engine.

Getting an API Key

Before making any requests you must have an API Key. If you do not have a key, please contact BrightPlanet’s support at [email protected] to request one.

An API Key is a unique identifier associated to your account and license. All calls made to our API require an API Key to be passed along with the call. Each call is also metered and logged with your key for audit purposes.
Never share your API Key. Any applications built around our API should allow the end-user to enter their own API Key instead of embedding your API Key.

An API Key is a GUID and looks something like this: 12345678-90ab-cdef-1234-567890abcdef

Our Technology

Dive into our technology and get a behind the scenes tour of what we mean when we use the terms Harvest, Curate, and Develop Insights.

Technology

BrightPlanet Data-as-a-Service

Our Data-as-a-Service offering gives you access to harvested and curated data from the web as a fully managed service.

Data-as-a-Service

Get Started with a Deep Review

Ready to get started? Our Deep Review is a great place to start. You’ll get direct access to our engineering and consulting team in a 6 week funded proof of concept using your actual data.

Get Started With A Deep Review

Get /datafeeds

Once you have a valid API Key you can view which data feeds (or databases) you have access to using the “/datafeeds” endpoint. BrightPlanet provides both standard and custom data feeds for customers. Each customer will only have access to the data feeds that they have licensed. To learn more about additional data feeds, contact your sales representative.

The “/dataFeeds” endpoint is only needed to request which data feeds that your API key has access to and does not change from one request to the next. It is fine to cache the data feed names between sessions.

HTTP GET
/dataFeeds

When using the “/dataFeeds” request, users will need to pass their api_key. Note that the dataFeeds call is case sensitive. The URL request below shows an example.

https://documentapi.brightplanet.com:443/documentapi/dataFeeds?api_key=[Your_Api_Key]

This will list the data feeds accessible for this api_key as well as the total number of used API requests per data feed. Your used requests will vary based on your license agreement, once the use hits your maximum, additional API requests will produce a rate limit error.

The below response shows that this api_key has access to one data feed called “bits” and they have made 13 API requests already.

{
“datafeedName”: “bits”,
“usedRequests”: 13
 }

Get /datafeed/{datafeed}

Each dataFeed that you have access to has custom entities that are extracted from the data as it is harvested. To view the custom entities that are available in a specific dataFeed, you can use the GET /datafeed/{dataFeed}

An example request for the Global News Data Feed would be:

https://documentapi.brightplanet.com/documentapi/dataFeed/bits?api_key=[Your_Api_Key]

The response then displays all the facetFields that are available in the Global News Data Feed:

[  
   {  
      "dataFeedName":"bits",
      "usedRequests":84,
      "facetFields":[  
         "otherEntity_publication",
         "otherEntity_disease",
         "otherEntity_relationshipDisease",
         "otherEntity_person",
         "otherEntity_drug",
         "otherEntity_place",
         "otherEntity_relationshipPlace",
         "otherEntity_certification",
         "otherEntity_facility",
         "otherEntity_weapon",
         "otherEntity_company",
         "otherEntity_chemical"
      ]
   }
]

Get /docs/search

Now we have reached the point where we can begin requesting documents using the /docs/search request.

HTTP GET
/docs/search

This request is the main access point to all content and contains the follow request parameters:

Parameter	Description
datafeed	The name of the data feed from which you receive data (must be a valid match with those data feeds names returned from the /dataFeeds request.)
query*	The actual keyword string query that controls which documents are returned.
start	The row at which the user would like to start receiving data. This works well for splitting up requests for large amounts of data in multiple requests.
rows	The maximum number of individual documents to be returned from the request.
startDate	The earliest document you’d like to receive from the API based off the document harvest date. Must be passed in YYYY-MM-DD format.
endDate	The most recent document you’d like to receive from the API based off the document harvest date. Must be passed in YYYY-MM-DD format.
API Key	The static passkey given to the user by BrightPlanet.
facetFields	Controls which nonstandard entities to include in the return. Standard entities that are always returned include: People, Companies, and Places.

Building a Query

To make a request for ten different documents which contain the term Microsoft, all users have to do is pass the following parameters: (note that all of the other parameters were left blank)

https://documentapi.brightplanet.com:443/documentapi/docs/search?api_key=[Your_Api_Key]&dataFeed=bits&query=microsoft&start=0&rows=10

The following JSON output is returned and contains all the harvested and enriched data from the search request.

https://documentapi.brightplanet.com:443/documentapi/docs/search?api_key=[Your_Api_Key]&dataFeed=bits&query=microsoft&start=0&rows=10

The following JSON output is returned and contains all the harvested and enriched data from the search request.

{  
     "datafeed": "bits",
     "query": "microsoft",
     "start": 0,
     "rows": 10,
     "endDate": "2014-11-20T20:52:03.204+0000",
     "documents":
[  
 {
     "source": "http://feeds.reuters.com/reuters/companyNews", 
     "initialUrl": "http://feeds.reuters.com/~r/reuters/companyNews/~3/Z2wMAfWzJJ4/story01.htm", 
     "finalUrl": "http://feeds.reuters.com/~r/reuters/companyNews/~3/Z2wMAfWzJJ4/story01.htm", 
     "mimeType": "XML Doc", 
     "docId": "18269999", 
     "docSummary": " Rockstar agree to settle patent litigation-filing ", 
     "enrichedDoc": "Rockstar agree to settle patent litigation-filing12:01pm EST Yahoo to replace Google as default....",
     "title": "UPDATE 1-Google, Rockstar agree to settle patent litigation -filingn| Reuters",  
      "status": "new",     
     "harvestDate": "2014-11-20T19:24:05.000+0000"},     
{      
     "includedFacets":   
{       
     "otherEntity_company": ["Apple","Microsoft","Uber Technologies"],  
     "otherEntity_person": ["Louella Parsons","Michael Wolff","David Paul Morris","Bob Pittman"],  
     "otherEntity_place": ["New York","Waverly","Hollywood"]    
   }

facetField Parameter

The facetField parameter controls which non custom facets are included in the return format when requesting documents. Company, People, and Places tagged within the documents are always included. To request additional facets, users pass a comma separated list of additional facets for inclusion.

A user wanting to also receive Crimes, Weapons, and Drugs mentioned within the documents simply need to pass the following.

facetFields=crime,weapon,drug

Note that the facets passed are case sensitive and should all be expressed lower case with no spaces in between commas. Available facets change with each feed, for a list of all facets available, email [email protected].

The HTTP request is as follows.

https://documentapi.brightplanet.com:443/documentapi/docs/search?api_key=YOUR_API_KEY&dataFeed=bits&query=test&start=0&rows=10&facetFields=crime%2Cweapon%2Cdrug

Each data feed has unique facetFields available for querying. To identify which facetFields are available for a specific available data feeds, e-mail [email protected].

Query Parameter

The query parameter that can be passed in the docs/search call is a highly flexible parameter that allows end users to control documents being returned. The query parameter has a large number of features that modify the behavior of the query: Boolean capabilities, wildcard searches, proximity operators, and the ability to search and return documents based on tagged entities and metadata.

Query	Returns documents….
Big AND Data	Containing both “Big” and “Data”
Bigger OR Data	Containing either the word Bigger or Data
+Big –Data	Containing the word Big but not Data
Te?t	Containing any word that starts with a “te” has one letter in between and ends with a “t”, such as text or test.
Te*t	Containing any word that starts with a “te” has any number of letters and then ends with “t”, such as tempt,
otherEntity_person:”Barack Obama”	Containing Barack Obama mentioned as a person
otherEntity_place:”Paris, France”	Containing Paris, France tagged
“boycott Google”~5	With the keyword “Boycott” and “Google” within 5 words of each other

Best Practices

Ensure that all of the searches are properly encoded
Limit your searches to a maximum of 10 operators
Use your dashboard to help quickly filter results and develop queries

Exploring Document and Entity Counts

BrightPlanet’s Document Search API supports the ability to return counts of documents or entities that have been tagged within the data feed that matches your specific query. We have 3 unique calls that allow you to receive data about the counts.

GET /docs/count – Returns total number of documents for a given facetname/facetvalue
Get /docs/facet/date/count – Returns number of times a facet is tagged by date
GET /docs/facet/count – Returns number of times a facet is tagged by query

Count Parameters

The count parameters are fairly consistent across all the GET count requests, information about each Count parameter that can be passed can be found below.

Parameter	Description
datafeed	The name of the data feed from which you receive data (must be a valid match with those data feeds names returned from the /dataFeeds request.)
facetName	The name of the facet or entity type that you want to include.
facetValue	The specific extracted entity that you would like included.
startDate	The earliest document you’d like to receive from the API based off the document harvest date. Must be passed in YYYY-MM-DD format.
endDate	The most recent document you’d like to receive from the API based off the document harvest date. Must be passed in YYYY-MM-DD format.
API Key	The static passkey given to the user by BrightPlanet.
query	The actual keyword string query that controls which documents are returned.
dateGap	The number of day span that results can be grouped into. For example +1DAY, +3DAY, +7DAY, +10Day, etc.

Get /docs/count

Using the /docs/count allows you to return counts of documents that contain a specified facet or entity. For example, we want to get a count of the all the documents that mention the disease cancer in some type of format. We simply need to specify the facetName as a disease, set the facetValue to ‘cancer’ and include the Key and dataFeed.

The request would be:

https://documentapi.brightplanet.com/documentapi/docs/count?api_key=YOUR_API_KEY&dataFeed=bits&facetName=disease&facetValue=Cancer

With the output simply displaying the count of 101,764.

{  
"dataFeed":"bits",
"start":0,
"rows":0,
"totalCount":101764,
"endDate":"2015-02-11T17:08:49.591+0000"
}

Get /docs/facet/data/count

The /docs/facet/date/count allows you to see counts of a specific entity as it’s occurring over time. For example, we want to see the counts of how often Barack Obama is mentioned in the Global News Data Feed weekly. To get this result back, we pass ‘person’ as the facetName, ‘Barack Obama’ as the facetValue, specify the starting and end date, and finally specify the dateGap to +7DAY to control the count groupings to every 7 Days. Your Request URL looks like this:

https://documentapi.brightplanet.com/documentapi/docs/facet/date/count?api_key=YOUR_API_KEY&dataFeed=bits&facetName=person&facetValue=%22Barack%20Obama%22&startDate=2015-01-01&endDate=2015-02-11&dateGap=%2B7DAY

The Response then displays the count of Barack Obama grouped every seven days.

{  
"dataFeed":"bits",
"query":"otherEntity_person:"Barack Obama"",
"start":0,
"rows":0,
"totalCount":207209,
"startDate":"2015-01-01T06:00:00.000+0000",
"endDate":"2015-02-11T06:00:00.000+0000",
"facetCount":{  
"2015-01-08T06:00:00Z":3270,
"2015-02-05T06:00:00Z":2224,
"2015-01-22T06:00:00Z":3804,
"2015-01-15T06:00:00Z":4132,
"2015-01-01T06:00:00Z":2636,
"2015-01-29T06:00:00Z":4655
}
}

Get /docs/facet/count

This call returns the count of the number of mentions of each entity within a facet that is passed. For example, we want to see the counts of diseases mentioned in the Global News Data Feed from January 1 to February 11. We use the following HTTP Request:

https://documentapi.brightplanet.com/documentapi/docs/facet/count?api_key=YOUR_API_KEY&dataFeed=bits&query=*%3A*&facetName=disease&startDate=2015-01-01&endDate=2015-02-11

We passed the *:* for our query to specify all data within the given date range. We also used disease as our facetName parameter to say return the counts of disease. Our JSON output then is shown below:

{  
"dataFeed":"bits",
"query":"*:*",
"start":0,
"rows":0,
"totalCount":849287,
"startDate":"2015-01-01T06:00:00.000+0000",
"endDate":"2015-02-11T06:00:00.000+0000",
"facetCount":{  
"Polio":826,
"Communicable Diseases":271,
"Avian Flu":356,
"H1n1":656,
"Dengue":294,
"Glaucoma":225,
"Hepatitis B":164,
"Cholera":482,
   }
}

Get /docs/enrichments/{docMasterId}

This call, when passed a docMasterId, returns enrichments of the entities from that document. Enrichments, will return confidence and polarity scores on entities within the documents. In addition, properties within documents, such as external domains, and URLs are also displayed. Confidence is a score from 0-100 that indicates how important that specific entity is to that given document. A polarity score, is a ranking from -3 to +3 that indicates how positive or negative that entity is within a given document.

Let’s find the enrichments for the document with a docMasterId of 988654, the request URL would be:

https://documentapi.brightplanet.com/documentapi/docs/enrichments/988654?api_key=YOURAPIKEY&dataFeed=bits

The payload returns, shows all the enrichments for that document.

{  
  "dataFeed":"bits",
     "companies": 
     [
     { 
     "id": 35670, 
     "companyName": "Department Of Justice", 
     "docOffset": 507, 
     "confidence": 27, 
     "polarity": 0 } ], 
     "otherEntities":
     [
     {
      "id": 324826, 
      "value": "Chief Executive",
      "normalized": "Chief Executive",
      "entityType": "org", 
      "docOffset": 139,
      "confidence": 34,
      "polarity": 0 },
      }
     "properties": 
     [
     {
      "id": 4, "propertyType": "host",
      "value": "twitter.com" },
     {
      "id": 18, "propertyType": "host",
      "value": "www.linkedin.com"
     }
    }
     "relationships": 
 [
  { 
   "id": 39062579, 
   "relationshipType": "PersonToOrg", 
   "subject": 
   { 
    "id": 370610,
    "value": "Leung", 
    "normalized": "Leung", 
    "entityType": "person", 
    "docOffset": 395,
    "confidence": 65, 
    "polarity": 0 }, 
    "object": 
   { 
    "id": 334613,
    "value": "Department Of Justice", 
    "normalized": "Department Of Justice",
    "entityType": "org",
    "docOffset": 507, 
    "confidence": 27,
    "polarity": 0 
     }
   } 
 ]
}

Example Searches

Find documents that mention some keywords but not others

You want to find documents that mention the phrases “Big Data” AND Unstructured but don’t discuss Hadoop or HDFS. You’ll use the following query:

(“Big Data” AND Unstructured) NOT Hadoop NOT HDFS

Your Request URL Is:

https://documentapi.brightplanet.com:443/documentapi/docs/search?api_key=YOUR_API_KEY&dataFeed=bits&query=(%22Big%20Data%22%20AND%20Unstructured)%20NOT%20Hadoop%20NOT%20HDFS

Know when two different entities are mentioned within a document

You want all documents that mention Barack Obama as a person and also mention the company Apple. A simple keyword search for “Barack Obama” and “Apple” returns other mentions of apples as a fruit. To help search only when Apple is mentioned as a company you use the following query:

+otherEntity_person:”Barack Obama” AND +otherEntity_company:”Apple”

It’s important to note that the field names are case sensitive, so follow the exact syntax.

Your Request URL Is:

https://documentapi.brightplanet.com:443/documentapi/docs/search?api_key=YOUR_API_KEY&dataFeed=bits&query=%2BotherEntity_person%3A%22Barack%20Obama%22%20AND%20%2BotherEntity_company%3A%22Apple%22

Find documents that contain a specific title

You are looking to search our Global News Data Feed for any documents that mention Google in the title. You need to use the title: field and pass *Google* to specify the keyword Google preceded by anything and followed by anything. You use the following query:

title:”*Google*”

Your Request URL Is:

https://documentapi.brightplanet.com:443/documentapi/docs/search?api_keyYOUR_API_KEY&dataFeed=bits&query=title%3A*Google*

Additional Questions

Do you have additional questions on how to further use the search API that is not answered here? E-mail your questions to our support team at [email protected].

API

Quick Start Guide

Getting an API Key

Our Technology

BrightPlanet Data-as-a-Service

Learn More About Us

Get Started with a Deep Review

Get /datafeeds

Get /datafeed/{datafeed}

Get /docs/search

Building a Query

facetField Parameter

Query Parameter

Best Practices

Exploring Document and Entity Counts

Get /docs/count

Get /docs/facet/data/count

Get /docs/facet/count

Get /docs/enrichments/{docMasterId}

Example Searches

Find documents that mention some keywords but not others

Your Request URL Is:

Know when two different entities are mentioned within a document

Your Request URL Is:

Find documents that contain a specific title

Your Request URL Is:

Additional Questions