Skip to main content

High Availability options with Hadoop distributions

With disaster recovery the talk of the town after recent storm on US east coast, let’s take a look at what options are available for High Availability with various distributions of Hadoop.

Let’s begin with understanding the HA issue in otherwise high reliable Apache Hadoop. An extract from Cloudera text:

“Before … the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine.

This reduced the total availability of the HDFS cluster in two major ways:
1. In the case of an unplanned event such as a machine crash, the cluster would be unavailable until an operator restarted the NameNode.
2. Planned maintenance events such as software or hardware upgrades on the NameNode machine would result in periods of cluster downtime.”

Now let us look at various HA options available:

(1) Cloudera:

1. 1 Quorum-based Storage

“Quorum-based Storage refers to the HA implementation that uses Quorum Journal Manager (QJM).
In order for the Standby node to keep its state synchronized with the Active node in this implementation, both nodes communicate with a group of separate daemons called JournalNodes…In the event of a failover, the Standby will ensure that it has read all of the edits from the JournalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.”

1. 2 Shared Storage Using NFS

“In order for the Standby node to keep its state synchronized with the Active node, this implementation requires that the two nodes both have access to a directory on a shared storage device (for example, an NFS mount from a NAS).
When any namespace modification is performed by the Active node, it durably logs a record of the modification to an edit log file stored in the shared directory. The Standby node constantly watches this directory for edits, and when edits occur, the Standby node applies them to its own namespace. In the event of a failover, the Standby will ensure that it has read all of the edits from the shared storage before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.”

More details are available in the Cloudera High Availability Guide

(2) HortonWorks

2. 1 HortonWorks and VMware HA

“… jointly developed Hortonworks Data Platform High Availability (HA) Kit for VMware vSphere customers that enables full stack high availability for Hadoop 1.0 by eliminating the NameNode and JobTracker single points of failure. It is a flexible virtual machine-based high availability solution that integrates with the VMware vSphere™ platform’s HA functionality to monitor and automate failover for NameNode and JobTracker master services running within the Hortonworks Data Platform (HDP).”


2. 2 Linux HA

Excerpt from FAQ which gives the gist of Linux HA
If this was so easy to do with Linux HA or other tools why didn’t the HDFS community do this earlier?
This is partly because the original HDFS team focused on very large clusters where cold failover was not practical. We assumed that Hadoop needed to provide its own built-in solution As we’ve developed this technology, we’ve heard directly from our customers that HA solutions are complex and that they prefer using their existing, well understood, solutions.”

The following presentation offers more insight into HortonWorks initiatives for HA and what to expect post Hadoop 2.0 stabilization. 

 (Click on image to read the pdf)

(3) MapR

MapR’s Lockless Storage Services feature a distributed HA architecture:
• The metadata is distributed across the entire cluster. Every node stores and serves a portion of the metadata.
• Every portion of the metadata is replicated on three different nodes (this number can be increased by the
administrator). For example, the metadata corresponding to all the files and directories under /project/
advertising would exist on three nodes. The three replicas are consistent at all times except, of course, for a
short time after a failure.
• The metadata is persisted to disk, just like the data.

The following illustration shows how metadata is laid out in a MapR cluster (in this case, a small 8-node cluster).
Each colored triangle represents a portion of the overall metadata; or in MapR terminology, the metadata of a single volume:


(4) IBM

4.1 Redundancy in Hardware for Master nodes (name node, secondary name node, job tracker)
4.2 Use GPFS-SNC

 (Click on image to read the pdf)

For the academically oriented, there a few research papers from IBM Research which offer more insight into the subject:
Feng Wang, Jie Qiu, Jie Yang, Bo Dong, Xinhui Li, and Ying Li. 2009. Hadoop high availability through metadata replication. In Proceedings of the first international workshop on Cloud data management (CloudDB '09). ACM, New York, NY, USA, 37-44

Rajagopal Ananthanarayanan, Karan Gupta, Prashant Pandey, Himabindu Pucha, Prasenjit Sarkar, Mansi Shah, and Renu Tewari. 2009. Cloud analytics: do we really need to reinvent the storage stack?. In Proceedings of the 2009 conference on Hot topics in cloud computing (HotCloud'09). USENIX Association, Berkeley, CA, USA, , pages.

Lanyue Lu, Dean Hildebrand, and Renu Tewari. 2011. ZoneFS: Stripe remodeling in cloud data centers. In Proceedings of the 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST '11). IEEE Computer Society, Washington, DC, USA, 1-10

 All copyright and trademarks lie with their respective owners.


  1. Hi,

    Thanks for posting this summary. I wanted to clarify one thing -- at the top of the post you've outlined two options as being "Cloudera" options. While I'm proud to acknowledge that most of the engineering for these options was done by engineers here at Cloudera, the entirety of the code is upstream in the Apache Software Foundation project. It is entirely open source, and requires no external dependencies to set up other non-Apache software.

    It's worth noting that this is unique among all the options you've described - the others either rely on proprietary non-free software (like MapR) or on complicated deployment (Linux-HA). The Linux-HA and VMware based solutions also only provide cold failover which is inadequate for real-time systems such as HBase.

    Regardless, thanks again for the post.

    -Todd Lipcon
    (Engineer at Cloudera, Apache Hadoop committer)


Post a Comment

Popular articles

5 online tools in data visualization playground

While building up an analytics dashboard, one of the major decision points is regarding the type of charts and graphs that would provide better insight into the data. To avoid a lot of re-work later, it makes sense to try the various chart options during the requirement and design phase. It is probably a well known myth that existing tool options in any product can serve all the user requirements with just minor configuration changes. We all know and realize that code needs to be written to serve each customer’s individual needs. To that effect, here are 5 tools that could empower your technical and business teams to decide on visualization options during the requirement phase. Listed below are online tools for you to add data and use as playground. 1)      Many Eyes : Many Eyes is a data visualization experiment by IBM Research and the IBM Cognos software group. This tool provides option to upload data sets and create visualizations including Scatter Plot, Tree Ma

Data deduplication tactics with HDFS and MapReduce

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction. Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability. From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets. Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus o

In-memory data model with Apache Gora

Open source in-memory data model and persistence for big data framework Apache Gora™ version 0.3, was released in May 2013. The 0.3 release offers significant improvements and changes to a number of modules including a number of bug fixes. However, what may be of significant interest to the DynamoDB community will be the addition of a gora-dynamodb datastore for mapping and persisting objects to Amazon's DynamoDB . Additionally the release includes various improvements to the gora-core and gora-cassandra modules as well as a new Web Services API implementation which enables users to extend Gora to any cloud storage platform of their choice. This 2-part post provides commentary on all of the above and a whole lot more, expanding to cover where Gora fits in within the NoSQL and Big Data space, the development challenges and features which have been baked into Gora 0.3 and finally what we have on the road map for the 0.4 development drive. Introducing Apache Gora Although