Guest Column | July 8, 2016

Applying Machine Learning To Identify Compromised Credentials

Machine Learning

By David Rosenberg, CTO-DB Networks

It’s not uncommon for large enterprises to have hundreds or perhaps hundreds of thousands of databases — many out of date, many no longer used, and the vast majority not monitored or properly secured from possible attacks. Unauthorized databases access is increasingly a result of credential theft, and IT personnel are urgently trying to get their arms around the situation. They know they need to not only discover all their own databases, but must figure out how to secure them once they do.

An Osterman Research study found nearly 40 percent of enterprises are unable to monitor the majority of their databases in real time. When asked what database security issues are of most concern, compromised credentials was the top concern of the survey respondents. The next biggest concerns were the potential for the organization to experience a major data breach, followed by the inability to identify data breaches until it’s too late.

When building a defense against today’s intelligent, stealth, and truly nefarious threats, enterprises must discover and secure all of their databases. All too often, database identification and monitoring are not included or only partially considered by the Security Operations Center (SOC) team.

It’s typical for IT personnel to select default security measures for database security. All too often that approach, while simple to implement, is far less secure than what is required. With the ever changing cybersecurity landscape, databases are now one of the most valuable prizes for hackers. Attacks can occur through vulnerable or rogue application and more recently by stealing credentials of an application or privileged user. Even worse, penetration into the network through a third-party application to take over assets over an extended period of time has become all too common, and there are many high profile breaches that are examples of this.

Databases are so vulnerable because most legacy databases are simply not monitored to detect attacks or issues of any kind. They can remain idle on the network for years. What needs to occur to protect and secure user credentials is for every database to be identified, reviewed, retired, and decommissioned if no longer needed or secured if needed and still in use — and all databases need to be continuously monitored and managed. According to Osterman Research’s report, only 20 percent or organizations surveyed conduct database activity assessments on a more or less continuous basis. In fact, more than half of the organizations surveyed conduct database activity assessments at best once a quarter.

In reality, databases have tables associated with them. If IT understands these database tables and is able to view and monitor activity associated with tables, especially critical tables, threats can be far more easily identified and acted upon.

Within these database tables there is the notion of what we are terming a data flow, which is a combination of two concepts. First, there are the attributes of the specific table being accessed with either a read or write operation. Second, there’s what we would call the context which are the details of the network conduit connecting clients and data servers. The attributes of the conduit include the database user ID, the user’s source IP address, database server IP address and listener port, as well as the specific database instance. Through machine learning and monitoring data flow activity, abnormal activity, such as stolen credentials, is immediately identified.

In addition, enterprises should conduct a thorough inventory of all databases to understand what each database means to the organization and also to understand the overall database attack surface. Unknown and unmanaged databases leave the organization wide open to extremely high security risks. This lack of visibility and control over database and data governance is making it difficult for enterprises to track down data breaches before they do real damage. Identification of databases, monitoring database tables, machine learning, and analysis of data flow activity enables compromised credentials to be immediately identified and potential data breach contained.