|
|
We had been expecting
this… honestly, yes. Hortonworks, a Yahoo spin-off for Hadoop, has made the
headlines again. So while Marissa Mayer was busy revealing a new version of
Yahoo.com home page in a purple Facebook avatar, the erudite guys at Hortonworks
revealed the open source Apache Hadoop immediate roadmap.
Hortonwoks, the Palo Alto based company revealed its 3 pronged strategy to stay ahead of the sharp competition in the Hadoop
ecosystem. The key drivers of its strategy include:
(1)
Accelerate
enterprise adoption of Hadoop
(2) Providing a low latency based SQL
type query interface for Hadoop
(3) Making Hadoop cluster more secure
With these in mind, the
company revealed 3 components of its strategy:
(1)
Stinger
initiative
(2) Tez framework
(3) Knox Gateway
Stinger seems to be a
strategic move to keep Hive as the central interface for Hadoop querying. Just
this week, hadoopsphere.com published a post citing HCatalog could become partof Hive project. With Stinger, the company plans to make use of community
driven contributions to add more SQL like querying clause to HQL (Hive Query
language). Also, it claims to have achieved 90% reduction in Hive query
result time. Another significant addition is the introduction of ORCFile in direct
competition to Trevni and to tide over RC File format limitations. Tez
framework described in next paragraph also constitutes part of Stinger
initiative. Future additions include Buffer Caching, Vector Querying engine and
Query Planner.
Tez is one of the most
exciting revelations of the day. It is a “general-purpose, highly
customizable framework that creates simplifies data-processing tasks across
both small scale (low-latency) and large-scale (high throughput) workloads in
Hadoop.”
(Tez in Hindi language
means fast speed)
Tez apparently throws a challenge to Cloudera Impala. Currently, proposed as an
Apache incubator project with seed work already done, the project already has
22 committers which tells us something about the exciting race here. Tez aims
to optimize the latency by running the query in a single job rather than
multiple MapReduce jobs. Further it aims to leverage YARN to share data
processing primitives across Apache Pig, Apache Hive, Cascading and others.
Knox is the other
significant project which has been proposed for Apache incubation. Since
security is one of the key focus areas, it “provides a single point of
authentication and access for Apache Hadoop services in a cluster”.
Earlier, hadoopsphere.com had proposed comprehensive security architecture forApache Hadoop cluster which could be implemented with custom built utilities or
custom off the shelf tools. Knox fills in the vital authentication layer of the
security architecture instead of just relying on Kerberos. However, it still
needs to do a bit of work on cloud integration and web interface for the Hadoop
cluster.
Overall, Hortonworks has
shown its commitment to open source once again and driving upon the fact that
community based contributions can be innovative, exciting and ‘tez’ (fast).
ps: Arun Murthy/folks at
Hortonworks, please excuse the discretion of using Arun’s image in Speed movie look-alike
poster. And of course, kudos to many more heroes in the team.
All other images in this
post taken from Hortonworks blog







comments: