Apache Hama is one of the under-hyped projects in the Hadoop ecosystem but gaining a lot of traction steadily with the efforts of its committers. "Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms."
First a short introduction to BSP in Apache Hadoop context :
A look at Apache Hama components and its high level architecture :
Among its various use cases, a presentation which explores usage in Machine learning with Apache Hama - a slide deck from the recently concluded ApacheCon Europe 2012First a short introduction to BSP in Apache Hadoop context :
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop from Edward J. Yoon
A look at Apache Hama components and its high level architecture :
Algorithms available within Apache Hama:
Addition of multiple matricesMultiplication
Matrix Norm
Compute the transpose of matrix
Compute the determinant of square matrix
Cholesky Decomposition
Singular Value Decompostion
Other useful links:
Getting Started with Apache HamaApache Hama integration with YARN
Semi Clustering with Apache Hama
With its focus on data locality and certain reviews claiming much more faster processing of algorithms, this project is one to watch out for especially among the first movers on Hadoop 2.0 .
Comments
Post a Comment