The safety-first approach to using Hadoop for big data

Unstructured information needs to be protected, but the most-hyped tool that manages it has some pros and cons. Consider these five critical steps


As a “big data” technology, the open source Apache Hadoop architecture should be seen as the “poster child” for solving current and future big data challenges in protecting and securing sensitive information, according to Kevvie Fowler, a partner in KPMG Canada’s forensic practice.

Citing industry statistics that predict the Hadoop market will grow to become a $46 billion dollar market by 2018, Fowler noted that network managers and technology decision makers should pay close attention to big data technology in potentially solving network issues. Of that figure, he added, “over $20 billion is going to be strictly Hadoop.” Fowler made the high level comments during his recent SecTor security conference “Big Data Security: Securing the Insecurable” breakout session in Toronto.

Related: Hadoop on the network: How to prep for big data

From a general perspective, Fowler noted that big data represents a “perfect storm of risk” for organizations and that ultimately “big data breaches are inevitable.” Depending on the size of the data management systems, many organizations tend to run into big data challenges around the 500 terabyte to petabyte range, “in terms of having too much data that they can’t make sense of, process, or store using their actual existing infrastructures,” said Fowler.

As a data analytics and processes technology, securing Hadoop in a native way isn’t an simple task, Fowler admitted, adding that robust third-party solutions would need to come into play from a encryption, authorization, and logging perspective. Organizations should consider the inherent security challenges around the sheer volume of data to be secured, network architectural design and dealing with the minimal native security features of the technology.

It’s about minimizing the security risk, said Fowler, and boosting network security involves steps such as:

  • Identifying big data use and security requirements (“If you don’t need sensitive data, don’t store it”),
  • Using configurement management tools to manage and deploy clusters (“There are some free solutions that are out there”),
  • Validating nodes and client applications prior to admission of the cluster (You can actually put ASLs to restrict certain users”,
  • Leveraging transmission level security to ensure privacy of communications between clusters, and
  • Securing Hadoop-related applications with 3rd party applications and extensions.

Bottom line: the more big data organizations have, the more mission-critical it’s becoming to protect it. With this in mind, “security-enhanced” distributions of Hadoop might be a potential solution – a starting point – for organizations looking to boost data protection, security and privacy. “A lot of the practices from Apache Hadoop…can be applied in theory to other big data technologies,” he offered.

While you’re exploring big data, get on top of today’s biggest security issues by watching the on-demand Webinar: Is Your Internet Access Putting Your Business At Risk? and downloading the Internet Security eBook

 

Comments are closed.