Skip to main content

High Availability options with Hadoop distributions

With disaster recovery the talk of the town after recent storm on US east coast, let’s take a look at what options are available for High Availability with various distributions of Hadoop.

Let’s begin with understanding the HA issue in otherwise high reliable Apache Hadoop. An extract from Cloudera text:

“Before … the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine.

This reduced the total availability of the HDFS cluster in two major ways:
1. In the case of an unplanned event such as a machine crash, the cluster would be unavailable until an operator restarted the NameNode.
2. Planned maintenance events such as software or hardware upgrades on the NameNode machine would result in periods of cluster downtime.”

Now let us look at various HA options available:

(1) Cloudera:

1. 1 Quorum-based Storage

“Quorum-based Storage refers to the HA implementation that uses Quorum Journal Manager (QJM).
In order for the Standby node to keep its state synchronized with the Active node in this implementation, both nodes communicate with a group of separate daemons called JournalNodes…In the event of a failover, the Standby will ensure that it has read all of the edits from the JournalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.”

1. 2 Shared Storage Using NFS

“In order for the Standby node to keep its state synchronized with the Active node, this implementation requires that the two nodes both have access to a directory on a shared storage device (for example, an NFS mount from a NAS).
When any namespace modification is performed by the Active node, it durably logs a record of the modification to an edit log file stored in the shared directory. The Standby node constantly watches this directory for edits, and when edits occur, the Standby node applies them to its own namespace. In the event of a failover, the Standby will ensure that it has read all of the edits from the shared storage before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.”

More details are available in the Cloudera High Availability Guide

(2) HortonWorks

2. 1 HortonWorks and VMware HA

“… jointly developed Hortonworks Data Platform High Availability (HA) Kit for VMware vSphere customers that enables full stack high availability for Hadoop 1.0 by eliminating the NameNode and JobTracker single points of failure. It is a flexible virtual machine-based high availability solution that integrates with the VMware vSphere™ platform’s HA functionality to monitor and automate failover for NameNode and JobTracker master services running within the Hortonworks Data Platform (HDP).”


2. 2 Linux HA

Excerpt from FAQ which gives the gist of Linux HA
If this was so easy to do with Linux HA or other tools why didn’t the HDFS community do this earlier?
This is partly because the original HDFS team focused on very large clusters where cold failover was not practical. We assumed that Hadoop needed to provide its own built-in solution As we’ve developed this technology, we’ve heard directly from our customers that HA solutions are complex and that they prefer using their existing, well understood, solutions.”

The following presentation offers more insight into HortonWorks initiatives for HA and what to expect post Hadoop 2.0 stabilization. 

 (Click on image to read the pdf)

(3) MapR

MapR’s Lockless Storage Services feature a distributed HA architecture:
• The metadata is distributed across the entire cluster. Every node stores and serves a portion of the metadata.
• Every portion of the metadata is replicated on three different nodes (this number can be increased by the
administrator). For example, the metadata corresponding to all the files and directories under /project/
advertising would exist on three nodes. The three replicas are consistent at all times except, of course, for a
short time after a failure.
• The metadata is persisted to disk, just like the data.

The following illustration shows how metadata is laid out in a MapR cluster (in this case, a small 8-node cluster).
Each colored triangle represents a portion of the overall metadata; or in MapR terminology, the metadata of a single volume:


(4) IBM

4.1 Redundancy in Hardware for Master nodes (name node, secondary name node, job tracker)
4.2 Use GPFS-SNC

 (Click on image to read the pdf)

For the academically oriented, there a few research papers from IBM Research which offer more insight into the subject:
Feng Wang, Jie Qiu, Jie Yang, Bo Dong, Xinhui Li, and Ying Li. 2009. Hadoop high availability through metadata replication. In Proceedings of the first international workshop on Cloud data management (CloudDB '09). ACM, New York, NY, USA, 37-44

Rajagopal Ananthanarayanan, Karan Gupta, Prashant Pandey, Himabindu Pucha, Prasenjit Sarkar, Mansi Shah, and Renu Tewari. 2009. Cloud analytics: do we really need to reinvent the storage stack?. In Proceedings of the 2009 conference on Hot topics in cloud computing (HotCloud'09). USENIX Association, Berkeley, CA, USA, , pages.

Lanyue Lu, Dean Hildebrand, and Renu Tewari. 2011. ZoneFS: Stripe remodeling in cloud data centers. In Proceedings of the 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST '11). IEEE Computer Society, Washington, DC, USA, 1-10

 All copyright and trademarks lie with their respective owners.


  1. Hi,

    Thanks for posting this summary. I wanted to clarify one thing -- at the top of the post you've outlined two options as being "Cloudera" options. While I'm proud to acknowledge that most of the engineering for these options was done by engineers here at Cloudera, the entirety of the code is upstream in the Apache Software Foundation project. It is entirely open source, and requires no external dependencies to set up other non-Apache software.

    It's worth noting that this is unique among all the options you've described - the others either rely on proprietary non-free software (like MapR) or on complicated deployment (Linux-HA). The Linux-HA and VMware based solutions also only provide cold failover which is inadequate for real-time systems such as HBase.

    Regardless, thanks again for the post.

    -Todd Lipcon
    (Engineer at Cloudera, Apache Hadoop committer)


Post a Comment

Popular posts from this blog

Hadoop's 10 in LinkedIn's 10

LinkedIn, the pioneering professional social network has turned 10 years old. One of the hallmarks of its journey has been its technical accomplishments and significant contribution to open source, particularly in the last few years. Hadoop occupies a central place in its technical environment powering some of the most used features of desktop and mobile app. As LinkedIn enters the second decade of its existence, here is a look at 10 major projects and products powered by Hadoop in its data ecosystem.
1)      Voldemort:Arguably, the most famous export of LinkedIn engineering, Voldemort is a distributed key-value storage system. Named after an antagonist in Harry Potter series and influenced by Amazon’s Dynamo DB, the wizardry in this database extends to its self healing features. Available in HA configuration, its layered, pluggable architecture implementations are being used for both read and read-write use cases.
2)      Azkaban:A batch job scheduling system with a friendly UI, Azkab…

Data deduplication tactics with HDFS and MapReduce

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction.
Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability.
From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets.
Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus on how MapReduce and…

Top Big Data Influencers of 2015

2015 was an exciting year for big data and hadoop ecosystem. We saw hadoop becoming an essential part of data management strategy of almost all major enterprise organizations. There is cut throat competition among IT vendors now to help realize the vision of data hub, data lake and data warehouse with Hadoop and Spark.
As part of its annual assessment of big data and hadoop ecosystem, HadoopSphere publishes a list of top big data influencers each year. The list is derived based on a scientific methodology which involves assessing various parameters in each category of influencers. HadoopSphere Top Big Data Influencers list reflects the people, products, organizations and portals that exercised the most influence on big data and ecosystem in a particular year. The influencers have been listed in the following categories:

AnalystsSocial MediaOnline MediaProductsTechiesCoachThought LeadersClick here to read the methodology used.

Analysts:Doug HenschenIt might have been hard to miss Doug…