The Opportunity in Context : Is your Big Data speaking your language?

Natural Language Processing

Natural Language Processing

Pick one of the following two:

Alert: 123 computers indicate the presence of X malware


Alert 1: Computers belonging to 15 Senior Directors are indicating an infection of X malware. They are direct reports to the VP of Sales and Marketing, and all of them have come back today from a company event in Las Vegas. 45 other internal users also attended the same company event, but have yet to log into the company network.

Alert 2: 108 other computers across the internal network are also indicating the presence of X malware. These computers are not linked by any reporting chain or travel activity.


Yeah, my thoughts exactly. The second set of alerts offer a much better framing in the context of the organization, and thus could drive the right immediate action in terms of focused remediation as well as related mitigation activities relating to proactive messaging and thoughts about the notion that the organization was being targeted by malicious attackers. In addition, there could also be short- and long-term user education and awareness (especially about computing activity at conferences in Las Vegas).

Is this too far of a stretch of imagination and innovation? Assuming an event correlation platform can ingest the following:

  • Corporate badge activity
  • Corporate travel data
  • Corporate employee / user directory
  • Security incident and event management data

…and apply natural language processing, it shouldn’t be.


Stanford’s Core NLP Suite is a GPL-licensed framework of tools for processing English, Chinese, and Spanish.  It includes tools for tokenization (splitting of text into words), part of speech tagging, grammar parsing (identifying things like noun and verb phrases), named entity recognition, and more. Apache Lucene and Solr are not technically targeted at solving NLP problems,  but they contain a powerful number of tools for working with text ranging from advanced string manipulation utilities to powerful and flexible tokenization libraries to blazing fast libraries for working with finite state automatons. They also include a nice (and free) search engine.  Progressing past these needs can be met by GATE and Apache UIMA assist with building complex NLP workflows which need to integrate several different processing steps.

In summary, natural language processing has come a long way; and most of us see great examples on a daily basis as we use Siri or Cortana. Add in the right context, and the path from data to insight to action is shortened significantly; thus empowering a strategically-sound response faster.


Could enterprise security be better served by big data security platforms that ingest a sufficient number of enterprise data feeds to provide a contextualized and simple set of natural language alerts  that are dedicated to driving prioritized action; rather than requiring dedicated (human / expert) resources, multiple screens, directory lookups and several phone calls?

This mindset, or platform requirement, certainly isn’t new. In 2012, Sridhar Karnam (HP) posted an illustration that has since stuck in my mind as both simple and absent from most enterprise security event correlation platforms then..and now. Here it is:


Sridhar Karnam - Tip #1: Centralized approach – Unify security & IT operations

Sridhar Karnam – Tip #1: Centralized approach – Unify security & IT operations


Many enterprise security solutions today talk about presenting “a single pane of glass” for the organization’s security posture.  A very, very few seem to hold the goal of “De-FUD-ing” security as a paramount directive by presenting a holistic, asset-value-based, contextually-sound view of security risks to enterprise plain English / your language of choice.

Speaking of conferences in Las Vegas where malware often comes to mind, Black Hat and DefCon are starting in a couple of weeks and I’m hearing very good things about both, especially IoT Village at DefCon.

You may also like...

Leave a Reply

%d bloggers like this: