Skip to main content


Top Big Data Influencers of 2015

2015 was an exciting year for big data and hadoop ecosystem. We saw hadoop becoming an essential part of data management strategy of almost all major enterprise organizations. There is cut throat competition among IT vendors now to help realize the vision of data hub, data lake and data warehouse with Hadoop and Spark. As part of its annual assessment of big data and hadoop ecosystem, HadoopSphere publishes a list of top big data influencers each year. The list is derived based on a scientific methodology which involves assessing various parameters in each category of influencers. HadoopSphere Top Big Data Influencers  list reflects the people, products, organizations and portals that exercised the most influence on big data and ecosystem in a particular year. The influencers have been listed in the following categories: Analysts Social Media Online Media Products Techies Coach Thought Leaders Click here to read the methodology used. Analysts:
Recent posts

Deep dive into Actian Vortex architecture

The innovations continue at a rapid pace in SQL on Hadoop solutions with each vendor trying to outsmart the competition. In this second part of interview with Actian’s Emma McGrattan , we try to understand architecture of Actian Vortex’s SQL in Hadoop offering with particular focus on database/SQL layer named Vector.  Emma is the Senior Vice President for Engineering at Actian and described the "Marchitecture" (as she likes to term it) in a conversation with Sachin Ghai.  As per Emma, Actian Vortex product suite is among the fastest and most mature SQL 'in' Hadoop offering.  Actian Engineering has definitely put a lot of thought and innovation in the Vortex architecture. It is one of those products where the engineering team exactly knew the nuts and bolts of Hadoop as well as the cranks and shafts of database. It is rare currently to find an SQL offering which relies on HDFS as storage but still achieves enterprise grade resonant with the database category. Ut

SQL on Hadoop landscape with Actian's Emma McGrattan

The SQL on Hadoop capability is turning out to be a real game changer for data warehouse and Hadoop vendors. To get a sense of what is happening in this intriguing space, Sachin Ghai talked to Emma McGrattan about SQL on Hadoop solutions. Emma is the Senior Vice President for Engineering at Actian and has the responsibility for analytics platform. In this 2 part interview published on HadoopSphere, Sachin asked Emma questions about the generic SQL on Hadoop ecosystem and got further technical insights into market space and technical architecture. Read more to understand about SQL on Hadoop solutions (or SQL 'in' Hadoop as Actian likes to term it). What is the state of Hadoop ecosystem at this time? Where does it seem to be heading particularly with regards to SQL on Hadoop solutions? If I could point you to SQL in Hadoop landscape (graphical image), that is a good slide to use to explain where we see landscape today and what we see as the future plans for some of t

Apache Spark turning into API haven

Apache Spark is the hottest technology around in big data. It has the most generous contributions from the open source community. But with the latest release of Apache Spark 1.6, there is clearly a pattern evolving in where it is heading. And, currently that road seems to be an API haven. (Note, the term being used here is haven and not heaven). Spark innovations in 2015 March 2015: Spark 1.3 released. DataFrames introduced: SchemaRDD renamed and further innovated to give rise to DataFrames. DataFrames are not just RDDs with schema but have a huge army of useful operations that could be invoked with an exhaustive API. For some strange reason, DataFrames were decided to be more of relational nature only and so were pitched directly along with Spark SQL. A developer could use either DataFrame API or could use SQL to query relational form data which could be residing in tables or any Spark supported data source (like Parquet, JSON etc.). Nov 2015: Spark 1.6 released.

Large scale graph processing with Apache Hama

Recently Apache Hama team released official 0.7.0 version. According to the release announcement, there were big improvements in Graph package. In this article, we provide an overview of the newly improved Graph package of Apache Hama, and the benchmark results that performed by cloud platform team at Samsung Electronics. Large scale datasets are being increasingly used in many fields. Graph algorithms are becoming important for analyzing big data. Data scientists are able to predict the behavior of the customer, the trends of the market, and make a decision by analyzing the graph structure and characteristics. Currently there are a variety of open source graph analytic frameworks, such as Google’s Pregel [1] , Apache Giraph [2] , GraphLab [3] and GraphX [4] . These frameworks are aimed at computations varying from classical graph traversal algorithms to graph statistics calculations such as triangle counting to complex machine learning algorithms. However these framewor