Skip to main content

Top Big Data Influencers of 2015

2015 was an exciting year for big data and hadoop ecosystem. We saw hadoop becoming an essential part of data management strategy of almost all major enterprise organizations. There is cut throat competition among IT vendors now to help realize the vision of data hub, data lake and data warehouse with Hadoop and Spark.

As part of its annual assessment of big data and hadoop ecosystem, HadoopSphere publishes a list of top big data influencers each year. The list is derived based on a scientific methodology which involves assessing various parameters in each category of influencers. HadoopSphere Top Big Data Influencers list reflects the people, products, organizations and portals that exercised the most influence on big data and ecosystem in a particular year. The influencers have been listed in the following categories:

  • Analysts
  • Social Media
  • Online Media
  • Products
  • Techies
  • Coach
  • Thought Leaders


Doug Henschen It might have been hard to miss Doug Henschen writing for InformationWeek. With his accomplished media experience and proven expertise in industry analysis, Doug has now joined Wang at Constellation Research talking about big data. His current focus areas include good data, streaming, cloud solutions and self-service of data.

Merv Adrian The saner voice on big data in the important research firm Gartner, Merv Adrian makes sure we make sense out of the dichotomy between data warehouse and data lake. He understands the breadth and depth of Hadoop ecosystem and provides the vision to cross the hype.

Tony Baer When you talk to Tony Baer, don’t expect rebel thoughts just plain incisive wisdom unravelling with each statement. More prose looking like poetry, the analysis casts an indelible effect on your understanding of the big data ecosystem. He remains top of the Hadoop analyst game for many years in a row now.

Social Media:

Bernard Marr Bernard Marr is an author, speaker and consultant with wider interests in strategic performance, analytics, KPIs and big data.  He is the founder of Advanced Performance Institute and provides consulting to various organizations. Bernard has a massive following on Twitter and his LinkedIn posts' generate huge interaction and interest.

Cloudera Cloudera is the market leader in Hadoop distros and at the same time continues to influence social media followers. It may not have the most number of followers compared to other companies but most of it’s messages gets the right amplification and impact. Kudos to Cloudera social marketing team.

Gregory Piatetsky-Shapiro As the President of KDnuggets, Gregory is a founder of KDD (Knowledge Discovery and Data mining conferences). His social media messages attract the right amount of traffic and eye balls making him one of the most relevant social media influencers.

Online Media:

O’Reilly Media O’Reilly Media is a diversified group now with interests ranging from books to blogs, webinars to conferences. With Strata Hadoop World as one of its most visible product now after books, O’Reilly media is definitely shaping up the big data opinion in the industry.

TDWI With research papers, blogs, webinars and education events, TDWI continues to attract impressions and leads for marketers.

The Cube The Cube is a pioneering online television series filmed at various industry events. It brings the best minds on the show speaking up the future of big data. Chic image setting television, it boasts of the CxO speakers like no other forum can.


Actian Vortex Actian Vortex is one real sharp SQL in Hadoop product which brings the best of database SQL to Hadoop and YARN world. With innovative engineering under the hood to support ACID transactions and higher performance, it has motivated quite a few solutions in its arena.

Apache Flink Apache Flink started off a research product and soon created a unique identity for its streaming capabilities. It has influenced quite a few features in other competing streaming products like off heap memory management, datasets and the like.

Kyvos Insights Kyvos Insights is an OLAP product building cubes at big data scale while assuring low latency SLA on Hadoop. With pre-canned cubes, interactive queries on terabytes of data within 2 seconds is a real possibility and an eye catcher. As the trendsetter for cubes on Hadoop, it has inspired a few other imitations on its trail but none at par so far.


Reynold Xin As one of the co-founders of Databricks and Apache Spark, Reynold Xin continues to influence major innovations in Spark including Tungsten memory management, Dataframes and many more. Sharp and futuristic, he is a real tech force.

Roman Shaposhnik With the Open Data Platform (ODP), Roman Shaposhnik has got a new home for corporate Hadoop and continues to lead the initiative magnificently. Pushing many other Apache projects alongside like BigTop and acting as mentors to others like Ignite, Roman emerged as a true tech leader in last year.

Todd Lipcon When Todd brought HA to Hadoop, he brought Hadoop to the enterprise infrastructure. When Todd Lipcon has brought Kudu to Hadoop, he has brought Hadoop to the enterprise database. Believe it or not, but Todd has unassumingly and unwittingly become the enterprise champion for Hadoop.


Paco Nathan If you are looking for a Spark session in an industry event or on online resources, chances are you have may have attended one of Paco Nathan’s session. Evil mad scientist as he likes to proclaim himself, he is much more than Spark and lot of data science, maths, venture capital and learning coach among his many-many interests.

Shane Curcuru Community over code and Apache open source over corporate proprietary, Shane Curcuru has been evangelizing Apache for years now. As one of ASF directors, he ensures Apache brand name is taken care of in right measure and the community driven projects get their right share of sun.

Thought Leaders:

Ion Stoica As one of the main founders of Apache Spark, Ion Stoica already has rallied the entire data world around one product. However, his vison with Databricks does not seem to be confined to just a batch execution engine. It seems Databricks is out there to get a bigger share of data center with its cloud offerings and the innovations continue rolling in at an unprecedented velocity.

Mike Olson As the Chief Strategy Officer and Chairperson of Cloudera, Mike Olson has made sure Cloudera remains at the top of the Hadoop game. Resisting off the market IPO or acquisition bait and maintaining the innovation path, he has been keeping Cloudera a steady ship.  Open to disruptions like Spark and embracing partners, he has been one true leader who thinks and acts with vision and authority.

<< HadoopSphere Top Big Data Influencers of 2014 


  1. Congratulations Merv Adrian, Tony Baer, Constellation, TDWI.

  2. Monitoring and debugging of serverless computing can be tough.This is great blog. If you want to know more about this visit here Apache Hadoop Service.


Post a Comment

Popular articles

5 online tools in data visualization playground

While building up an analytics dashboard, one of the major decision points is regarding the type of charts and graphs that would provide better insight into the data. To avoid a lot of re-work later, it makes sense to try the various chart options during the requirement and design phase. It is probably a well known myth that existing tool options in any product can serve all the user requirements with just minor configuration changes. We all know and realize that code needs to be written to serve each customer’s individual needs. To that effect, here are 5 tools that could empower your technical and business teams to decide on visualization options during the requirement phase. Listed below are online tools for you to add data and use as playground. 1)      Many Eyes : Many Eyes is a data visualization experiment by IBM Research and the IBM Cognos software group. This tool provides option to upload data sets and create visualizations including Scatter Plot, Tree Ma

Data deduplication tactics with HDFS and MapReduce

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction. Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability. From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets. Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus o

In-memory data model with Apache Gora

Open source in-memory data model and persistence for big data framework Apache Gora™ version 0.3, was released in May 2013. The 0.3 release offers significant improvements and changes to a number of modules including a number of bug fixes. However, what may be of significant interest to the DynamoDB community will be the addition of a gora-dynamodb datastore for mapping and persisting objects to Amazon's DynamoDB . Additionally the release includes various improvements to the gora-core and gora-cassandra modules as well as a new Web Services API implementation which enables users to extend Gora to any cloud storage platform of their choice. This 2-part post provides commentary on all of the above and a whole lot more, expanding to cover where Gora fits in within the NoSQL and Big Data space, the development challenges and features which have been baked into Gora 0.3 and finally what we have on the road map for the 0.4 development drive. Introducing Apache Gora Although