Tagging and Data Harvesting

CASE STUDY: How Tagging and Data Harvesting Helps Keep You Updated on Life Events

Our lives are ever-changing. Many, if not all, of us will experience a monumental change somewhere down the road. In order for business owners to grow, they must maintain meaningful relationships with their clients. Therefore, it’s important for them to be aware of the important moments happening in their lives. Tagging and data harvesting is a good place to start. 

A client had BrightPlanet monitor social media feeds and open-source sites to look for indicators of life event changes. Life changes — like a death, marriage, divorce, a new job or the birth of a child — indicated when individuals are likely to make financial planning changes. The client was interested in targeting individuals that live in affluent communities to maintain a greater return on investment.

We tagged those that indicated life changes, combined their profile information with offline CRM content, and then channeled all of it into an outreach pipeline leads list for a new agent or asset manager to contact. While the platform was developed for wealth managers, other fields could benefit from a similar workflow.

There were two key areas for this project; harvesting the content and tagging data indicators.


We harvested data from multiple types of sources including social media, obituaries, birth announcements, church websites and local community sites. Because the raw data did not contain a lot of geographical context, we relied on manually tagging the data based on the areas that they served.

For example, when we target churches around the Boston area, we tagged each harvest based on the cities or areas that they served. Social media data became limited to primarily Twitter because of the availability to the content and metadata, and the ability to search by a geographic location for instances of specific keywords.


We extracted entity data and harvested documents to detect primarily contact information and life change indicators. This project did not need to have full contact information. However, we harvested full names, email addresses and extracted addresses when they existed.


We created an additional custom entity to indicate a number of life events. This entity recorded every person referenced in relation to a death, having a child, getting married or divorced, or new employment. While these lists were not exhaustive, they were sufficient to flag contact information without a lot of false positive results.



After we harvested the content, the hierarchical data went into Neo4j to properly represent the graphic nature of the data. Once loaded into Neo4J, it produced a graph visualization using a proprietary platform that they developed. However, an open-source tool like Gephi or D3 can also work.

The Next Steps

Name disambiguation was an issue because names did not always match exactly across open-source data sources and off-line data. Leveraging additional 3rd party name disambiguation, like that available through Rosoka, could be used in the future to provide better matching.

Are you interested in learning how data harvesting could be used to provide you with important business intelligence? Schedule a consultation with one of our data acquisition engineers to discuss your organization and how data is currently being leveraged and could be enhanced with open source sources.