API
The BrightPlanet Document API is part of BrightPlanet’s REST API. It allows queries against the curated data feeds provided by our Data-as-a-Service platform and behaves similarly to the search feature available in BrightPlanet’s Search Dashboard.
Before digging in, it’s important to know that the Document API is focused on content within the data feed. This means that results will only contain content that is already harvested from your subscribed data feeds. This quick start guide will show you how to find an appropriate data feed and then begin requesting content from that data feed. It’s also important to note that BrightPlanet’s document API can be found and tested in your browser here: https://api.brightplanet.com/.
Quick Start Guide
The API is built around the docs/search call which allows users to request data from BrightPlanet’s harvested documents based off of a highly flexible query engine.
Getting an API Key
Before making any requests you must have an API Key. If you do not have a key, please contact BrightPlanet’s support at [email protected] to request one.
An API Key is a unique identifier associated to your account and license. All calls made to our API require an API Key to be passed along with the call. Each call is also metered and logged with your key for audit purposes.
Never share your API Key. Any applications built around our API should allow the end-user to enter their own API Key instead of embedding your API Key.
An API Key is a GUID and looks something like this: 12345678-90ab-cdef-1234-567890abcdef
Our Technology
Dive into our technology and get a behind the scenes tour of what we mean when we use the terms Harvest, Curate, and Develop Insights.
Get Started with a Deep Review
Ready to get started? Our Deep Review is a great place to start. You’ll get direct access to our engineering and consulting team in a 6 week funded proof of concept using your actual data.
Get /datafeeds
Once you have a valid API Key you can view which data feeds (or databases) you have access to using the “/datafeeds” endpoint. BrightPlanet provides both standard and custom data feeds for customers. Each customer will only have access to the data feeds that they have licensed. To learn more about additional data feeds, contact your sales representative.
The “/dataFeeds” endpoint is only needed to request which data feeds that your API key has access to and does not change from one request to the next. It is fine to cache the data feed names between sessions.
HTTP GET
/dataFeeds
When using the “/dataFeeds” request, users will need to pass their api_key. Note that the dataFeeds call is case sensitive. The URL request below shows an example.
https://documentapi.brightplanet.com:443/documentapi/dataFeeds?api_key=[Your_Api_Key]
This will list the data feeds accessible for this api_key as well as the total number of used API requests per data feed. Your used requests will vary based on your license agreement, once the use hits your maximum, additional API requests will produce a rate limit error.
The below response shows that this api_key has access to one data feed called “bits” and they have made 13 API requests already.
{ “datafeedName”: “bits”, “usedRequests”: 13 }
Get /datafeed/{datafeed}
Each dataFeed that you have access to has custom entities that are extracted from the data as it is harvested. To view the custom entities that are available in a specific dataFeed, you can use the GET /datafeed/{dataFeed}
An example request for the Global News Data Feed would be:
https://documentapi.brightplanet.com/documentapi/dataFeed/bits?api_key=[Your_Api_Key]
The response then displays all the facetFields that are available in the Global News Data Feed:
[ { "dataFeedName":"bits", "usedRequests":84, "facetFields":[ "otherEntity_publication", "otherEntity_disease", "otherEntity_relationshipDisease", "otherEntity_person", "otherEntity_drug", "otherEntity_place", "otherEntity_relationshipPlace", "otherEntity_certification", "otherEntity_facility", "otherEntity_weapon", "otherEntity_company", "otherEntity_chemical" ] } ]
Get /docs/search
Now we have reached the point where we can begin requesting documents using the /docs/search request.
HTTP GET
/docs/search
This request is the main access point to all content and contains the follow request parameters:
Parameter | Description |
---|---|
datafeed | The name of the data feed from which you receive data (must be a valid match with those data feeds names returned from the /dataFeeds request.) |
query* | The actual keyword string query that controls which documents are returned. |
start | The row at which the user would like to start receiving data. This works well for splitting up requests for large amounts of data in multiple requests. |
rows | The maximum number of individual documents to be returned from the request. |
startDate | The earliest document you’d like to receive from the API based off the document harvest date. Must be passed in YYYY-MM-DD format. |
endDate | The most recent document you’d like to receive from the API based off the document harvest date. Must be passed in YYYY-MM-DD format. |
API Key | The static passkey given to the user by BrightPlanet. |
facetFields | Controls which nonstandard entities to include in the return. Standard entities that are always returned include: People, Companies, and Places. |
Building a Query
To make a request for ten different documents which contain the term Microsoft, all users have to do is pass the following parameters: (note that all of the other parameters were left blank)
https://documentapi.brightplanet.com:443/documentapi/docs/search?api_key=[Your_Api_Key]&dataFeed=bits&query=microsoft&start=0&rows=10
The following JSON output is returned and contains all the harvested and enriched data from the search request.
https://documentapi.brightplanet.com:443/documentapi/docs/search?api_key=[Your_Api_Key]&dataFeed=bits&query=microsoft&start=0&rows=10
The following JSON output is returned and contains all the harvested and enriched data from the search request.
{ "datafeed": "bits", "query": "microsoft", "start": 0, "rows": 10, "endDate": "2014-11-20T20:52:03.204+0000", "documents": [ { "source": "http://feeds.reuters.com/reuters/companyNews", "initialUrl": "http://feeds.reuters.com/~r/reuters/companyNews/~3/Z2wMAfWzJJ4/story01.htm", "finalUrl": "http://feeds.reuters.com/~r/reuters/companyNews/~3/Z2wMAfWzJJ4/story01.htm", "mimeType": "XML Doc", "docId": "18269999", "docSummary": " Rockstar agree to settle patent litigation-filing ", "enrichedDoc": "Rockstar agree to settle patent litigation-filing12:01pm EST Yahoo to replace Google as default....", "title": "UPDATE 1-Google, Rockstar agree to settle patent litigation -filingn| Reuters", "status": "new", "harvestDate": "2014-11-20T19:24:05.000+0000"}, { "includedFacets": { "otherEntity_company": ["Apple","Microsoft","Uber Technologies"], "otherEntity_person": ["Louella Parsons","Michael Wolff","David Paul Morris","Bob Pittman"], "otherEntity_place": ["New York","Waverly","Hollywood"] }
facetField Parameter
The facetField parameter controls which non custom facets are included in the return format when requesting documents. Company, People, and Places tagged within the documents are always included. To request additional facets, users pass a comma separated list of additional facets for inclusion.
A user wanting to also receive Crimes, Weapons, and Drugs mentioned within the documents simply need to pass the following.
facetFields=crime,weapon,drug
Note that the facets passed are case sensitive and should all be expressed lower case with no spaces in between commas. Available facets change with each feed, for a list of all facets available, email [email protected].
The HTTP request is as follows.
https://documentapi.brightplanet.com:443/documentapi/docs/search?api_key=YOUR_API_KEY&dataFeed=bits&query=test&start=0&rows=10&facetFields=crime%2Cweapon%2Cdrug
Each data feed has unique facetFields available for querying. To identify which facetFields are available for a specific available data feeds, e-mail [email protected].
Query Parameter
The query parameter that can be passed in the docs/search call is a highly flexible parameter that allows end users to control documents being returned. The query parameter has a large number of features that modify the behavior of the query: Boolean capabilities, wildcard searches, proximity operators, and the ability to search and return documents based on tagged entities and metadata.
Query | Returns documents…. |
---|---|
Big AND Data | Containing both “Big” and “Data” |
Bigger OR Data | Containing either the word Bigger or Data |
+Big –Data | Containing the word Big but not Data |
Te?t | Containing any word that starts with a “te” has one letter in between and ends with a “t”, such as text or test. |
Te*t | Containing any word that starts with a “te” has any number of letters and then ends with “t”, such as tempt, |
otherEntity_person:”Barack Obama” | Containing Barack Obama mentioned as a person |
otherEntity_place:”Paris, France” | Containing Paris, France tagged |
“boycott Google”~5 | With the keyword “Boycott” and “Google” within 5 words of each other |
Best Practices
- Ensure that all of the searches are properly encoded
- Limit your searches to a maximum of 10 operators
- Use your dashboard to help quickly filter results and develop queries
Exploring Document and Entity Counts
BrightPlanet’s Document Search API supports the ability to return counts of documents or entities that have been tagged within the data feed that matches your specific query. We have 3 unique calls that allow you to receive data about the counts.
- GET /docs/count – Returns total number of documents for a given facetname/facetvalue
- Get /docs/facet/date/count – Returns number of times a facet is tagged by date
- GET /docs/facet/count – Returns number of times a facet is tagged by query
Count Parameters
The count parameters are fairly consistent across all the GET count requests, information about each Count parameter that can be passed can be found below.
Parameter | Description |
---|---|
datafeed | The name of the data feed from which you receive data (must be a valid match with those data feeds names returned from the /dataFeeds request.) |
facetName | The name of the facet or entity type that you want to include. |
facetValue | The specific extracted entity that you would like included. |
startDate | The earliest document you’d like to receive from the API based off the document harvest date. Must be passed in YYYY-MM-DD format. |
endDate | The most recent document you’d like to receive from the API based off the document harvest date. Must be passed in YYYY-MM-DD format. |
API Key | The static passkey given to the user by BrightPlanet. |
query | The actual keyword string query that controls which documents are returned. |
dateGap | The number of day span that results can be grouped into. For example +1DAY, +3DAY, +7DAY, +10Day, etc. |
Get /docs/count
Using the /docs/count allows you to return counts of documents that contain a specified facet or entity. For example, we want to get a count of the all the documents that mention the disease cancer in some type of format. We simply need to specify the facetName as a disease, set the facetValue to ‘cancer’ and include the Key and dataFeed.
The request would be:
https://documentapi.brightplanet.com/documentapi/docs/count?api_key=YOUR_API_KEY&dataFeed=bits&facetName=disease&facetValue=Cancer
With the output simply displaying the count of 101,764.
{ "dataFeed":"bits", "start":0, "rows":0, "totalCount":101764, "endDate":"2015-02-11T17:08:49.591+0000" }
Get /docs/facet/data/count
The /docs/facet/date/count allows you to see counts of a specific entity as it’s occurring over time. For example, we want to see the counts of how often Barack Obama is mentioned in the Global News Data Feed weekly. To get this result back, we pass ‘person’ as the facetName, ‘Barack Obama’ as the facetValue, specify the starting and end date, and finally specify the dateGap to +7DAY to control the count groupings to every 7 Days. Your Request URL looks like this:
https://documentapi.brightplanet.com/documentapi/docs/facet/date/count?api_key=YOUR_API_KEY&dataFeed=bits&facetName=person&facetValue=%22Barack%20Obama%22&startDate=2015-01-01&endDate=2015-02-11&dateGap=%2B7DAY
The Response then displays the count of Barack Obama grouped every seven days.
{ "dataFeed":"bits", "query":"otherEntity_person:"Barack Obama"", "start":0, "rows":0, "totalCount":207209, "startDate":"2015-01-01T06:00:00.000+0000", "endDate":"2015-02-11T06:00:00.000+0000", "facetCount":{ "2015-01-08T06:00:00Z":3270, "2015-02-05T06:00:00Z":2224, "2015-01-22T06:00:00Z":3804, "2015-01-15T06:00:00Z":4132, "2015-01-01T06:00:00Z":2636, "2015-01-29T06:00:00Z":4655 } }
Get /docs/facet/count
This call returns the count of the number of mentions of each entity within a facet that is passed. For example, we want to see the counts of diseases mentioned in the Global News Data Feed from January 1 to February 11. We use the following HTTP Request:
https://documentapi.brightplanet.com/documentapi/docs/facet/count?api_key=YOUR_API_KEY&dataFeed=bits&query=*%3A*&facetName=disease&startDate=2015-01-01&endDate=2015-02-11
We passed the *:* for our query to specify all data within the given date range. We also used disease as our facetName parameter to say return the counts of disease. Our JSON output then is shown below:
{ "dataFeed":"bits", "query":"*:*", "start":0, "rows":0, "totalCount":849287, "startDate":"2015-01-01T06:00:00.000+0000", "endDate":"2015-02-11T06:00:00.000+0000", "facetCount":{ "Polio":826, "Communicable Diseases":271, "Avian Flu":356, "H1n1":656, "Dengue":294, "Glaucoma":225, "Hepatitis B":164, "Cholera":482, } }
Get /docs/enrichments/{docMasterId}
This call, when passed a docMasterId, returns enrichments of the entities from that document. Enrichments, will return confidence and polarity scores on entities within the documents. In addition, properties within documents, such as external domains, and URLs are also displayed. Confidence is a score from 0-100 that indicates how important that specific entity is to that given document. A polarity score, is a ranking from -3 to +3 that indicates how positive or negative that entity is within a given document.
Let’s find the enrichments for the document with a docMasterId of 988654, the request URL would be:
https://documentapi.brightplanet.com/documentapi/docs/enrichments/988654?api_key=YOURAPIKEY&dataFeed=bits
The payload returns, shows all the enrichments for that document.
{ "dataFeed":"bits", "companies": [ { "id": 35670, "companyName": "Department Of Justice", "docOffset": 507, "confidence": 27, "polarity": 0 } ], "otherEntities": [ { "id": 324826, "value": "Chief Executive", "normalized": "Chief Executive", "entityType": "org", "docOffset": 139, "confidence": 34, "polarity": 0 }, } "properties": [ { "id": 4, "propertyType": "host", "value": "twitter.com" }, { "id": 18, "propertyType": "host", "value": "www.linkedin.com" } } "relationships": [ { "id": 39062579, "relationshipType": "PersonToOrg", "subject": { "id": 370610, "value": "Leung", "normalized": "Leung", "entityType": "person", "docOffset": 395, "confidence": 65, "polarity": 0 }, "object": { "id": 334613, "value": "Department Of Justice", "normalized": "Department Of Justice", "entityType": "org", "docOffset": 507, "confidence": 27, "polarity": 0 } } ] }
Example Searches
Find documents that mention some keywords but not others
You want to find documents that mention the phrases “Big Data” AND Unstructured but don’t discuss Hadoop or HDFS. You’ll use the following query:
(“Big Data” AND Unstructured) NOT Hadoop NOT HDFS
Your Request URL Is:
https://documentapi.brightplanet.com:443/documentapi/docs/search?api_key=YOUR_API_KEY&dataFeed=bits&query=(%22Big%20Data%22%20AND%20Unstructured)%20NOT%20Hadoop%20NOT%20HDFS
Know when two different entities are mentioned within a document
You want all documents that mention Barack Obama as a person and also mention the company Apple. A simple keyword search for “Barack Obama” and “Apple” returns other mentions of apples as a fruit. To help search only when Apple is mentioned as a company you use the following query:
+otherEntity_person:”Barack Obama” AND +otherEntity_company:”Apple”
It’s important to note that the field names are case sensitive, so follow the exact syntax.
Your Request URL Is:
https://documentapi.brightplanet.com:443/documentapi/docs/search?api_key=YOUR_API_KEY&dataFeed=bits&query=%2BotherEntity_person%3A%22Barack%20Obama%22%20AND%20%2BotherEntity_company%3A%22Apple%22
Find documents that contain a specific title
You are looking to search our Global News Data Feed for any documents that mention Google in the title. You need to use the title: field and pass *Google* to specify the keyword Google preceded by anything and followed by anything. You use the following query:
title:”*Google*”
Your Request URL Is:
https://documentapi.brightplanet.com:443/documentapi/docs/search?api_keyYOUR_API_KEY&dataFeed=bits&query=title%3A*Google*
Additional Questions
Do you have additional questions on how to further use the search API that is not answered here? E-mail your questions to our support team at [email protected].