New Rosoka Release: Improving Data Enrichment and Entity Extraction

No great operation is ever done alone. Lucky enough for BrightPlanet, we are incredibly fortunate to be able to work alongside many technical companies that care deeply about their craft. Without their contributions, we wouldn’t have gained such valuable insight into expanding our own capabilities.

One of BrightPlanet’s long-time partners is Rosoka. We love working with their team and their entity extraction platform.

Like childhood friends, it feels like we have grown up with Rosoka. We love working with their team and their entity extraction platform. As they continue to expand their analytic platform, we will expand our capabilities and product offerings to include their new features and linguistic packages.

New Rosoka Release

Last week, Rosoka released their Rosoka Series 6 platform. Due to its efficiency, this platform is a huge step in simplifying implementation, integration, and updating custom knowledge bases. Many other improvements and features have also been included in Rosoka’s recent updates, and we will soon be rolling them out as part of our Data-as-a-Service solution.

Features and improvements we are excited to deploy for our clients include:

  • Multi-Vector Sentiment Support: polarity, mood, intensity, aspect for both individual entities and documents as a whole
  • Sentiment Measure: combines the multi-vector values of polarity, mood, intensity, and aspect to create an overall sentiment score for a document
  • More customizable control over how relationships between entities are created
  • New out-of-the-box entities, such as “Profession”, “Nationality”, “Program”, “Citation”, “File Name”, and “User Agent”
  • Improved cyber weapon and dynamic weapon entity extraction
  • Improved detection of familial relationships
  • Improved foreign language name and lexicon support
  • Significant lexical improvements across several other entity types

Rosoka and Our Data Enrichment Process

One of the most popular requests for BrightPlanet’s data enrichment process is sentiment analysis for both specific entity instances and across full documents.

The addition of mood, intensity, and aspect as sentiment vectors will boost the unique insights we’ve already seen with the polarity vector. Being able to apply all the same measurements to documents as a whole provides an additional level of detail at which to analyze sentiment in a custom dataset.

Another popular enrichment feature in our data is the ability to tag relationships between two entities in a subject-predicate-object format. Rosoka has built out robust predicate libraries so you can see exactly how two entities are related in a structured format.

For example, a common relationship-type is a person-to-person relationship. This type of relationship is created when two person entities are near each other. For example,  look at the two sentences below:

  • Bob Smith spoke with Joe Jones at the baseball game.
  • “Abraham Lincoln was assassinated by John Wilkes Booth.”

Both of these sentences contain person-to-person entity relationships, but the two people are related for obviously different reasons. Rosoka allows users to see what type of relationship binds the two together with the predicate vector. Bob and Joe are connected by a “communicated_with” vector; a “killed_by” predicate vector connects Abraham Lincoln and John Wilkes Booth.

Assisting with Data-as-a-Service

All of our projects include the base Rosoka entity library. Since most projects include custom entity types, we’re constantly expanding Rosoka’s engine rules and dictionaries.

Creating custom entities with proprietary rules is a process  we are very familiar with. While the process is simple and powerful, it does take good technique to get the desired results. Over the years, we have added dozens of new entity types with hundreds of thousands of new dictionary entries to our local Rosoka library.

We continue to see major leaps forward in Rosoka’s entity extraction solutions and additional services like geo-tagging, sentiment, and multi-language support.

Need enriched open-source content? Setup a consultation with one of our data acquisition engineers to discuss what you’re working on.