Skip to main content

Top Big Data Influencers of 2015

2015 was an exciting year for big data and hadoop ecosystem. We saw hadoop becoming an essential part of data management strategy of almost all major enterprise organizations. There is cut throat competition among IT vendors now to help realize the vision of data hub, data lake and data warehouse with Hadoop and Spark.

As part of its annual assessment of big data and hadoop ecosystem, HadoopSphere publishes a list of top big data influencers each year. The list is derived based on a scientific methodology which involves assessing various parameters in each category of influencers. HadoopSphere Top Big Data Influencers list reflects the people, products, organizations and portals that exercised the most influence on big data and ecosystem in a particular year. The influencers have been listed in the following categories:

  • Analysts
  • Social Media
  • Online Media
  • Products
  • Techies
  • Coach
  • Thought Leaders


Doug Henschen It might have been hard to miss Doug Henschen writing for InformationWeek. With his accomplished media experience and proven expertise in industry analysis, Doug has now joined Wang at Constellation Research talking about big data. His current focus areas include good data, streaming, cloud solutions and self-service of data.

Merv Adrian The saner voice on big data in the important research firm Gartner, Merv Adrian makes sure we make sense out of the dichotomy between data warehouse and data lake. He understands the breadth and depth of Hadoop ecosystem and provides the vision to cross the hype.

Tony Baer When you talk to Tony Baer, don’t expect rebel thoughts just plain incisive wisdom unravelling with each statement. More prose looking like poetry, the analysis casts an indelible effect on your understanding of the big data ecosystem. He remains top of the Hadoop analyst game for many years in a row now.

Social Media:

Bernard Marr Bernard Marr is an author, speaker and consultant with wider interests in strategic performance, analytics, KPIs and big data.  He is the founder of Advanced Performance Institute and provides consulting to various organizations. Bernard has a massive following on Twitter and his LinkedIn posts' generate huge interaction and interest.

Cloudera Cloudera is the market leader in Hadoop distros and at the same time continues to influence social media followers. It may not have the most number of followers compared to other companies but most of it’s messages gets the right amplification and impact. Kudos to Cloudera social marketing team.

Gregory Piatetsky-Shapiro As the President of KDnuggets, Gregory is a founder of KDD (Knowledge Discovery and Data mining conferences). His social media messages attract the right amount of traffic and eye balls making him one of the most relevant social media influencers.

Online Media:

O’Reilly Media O’Reilly Media is a diversified group now with interests ranging from books to blogs, webinars to conferences. With Strata Hadoop World as one of its most visible product now after books, O’Reilly media is definitely shaping up the big data opinion in the industry.

TDWI With research papers, blogs, webinars and education events, TDWI continues to attract impressions and leads for marketers.

The Cube The Cube is a pioneering online television series filmed at various industry events. It brings the best minds on the show speaking up the future of big data. Chic image setting television, it boasts of the CxO speakers like no other forum can.


Actian Vortex Actian Vortex is one real sharp SQL in Hadoop product which brings the best of database SQL to Hadoop and YARN world. With innovative engineering under the hood to support ACID transactions and higher performance, it has motivated quite a few solutions in its arena.

Apache Flink Apache Flink started off a research product and soon created a unique identity for its streaming capabilities. It has influenced quite a few features in other competing streaming products like off heap memory management, datasets and the like.

Kyvos Insights Kyvos Insights is an OLAP product building cubes at big data scale while assuring low latency SLA on Hadoop. With pre-canned cubes, interactive queries on terabytes of data within 2 seconds is a real possibility and an eye catcher. As the trendsetter for cubes on Hadoop, it has inspired a few other imitations on its trail but none at par so far.


Reynold Xin As one of the co-founders of Databricks and Apache Spark, Reynold Xin continues to influence major innovations in Spark including Tungsten memory management, Dataframes and many more. Sharp and futuristic, he is a real tech force.

Roman Shaposhnik With the Open Data Platform (ODP), Roman Shaposhnik has got a new home for corporate Hadoop and continues to lead the initiative magnificently. Pushing many other Apache projects alongside like BigTop and acting as mentors to others like Ignite, Roman emerged as a true tech leader in last year.

Todd Lipcon When Todd brought HA to Hadoop, he brought Hadoop to the enterprise infrastructure. When Todd Lipcon has brought Kudu to Hadoop, he has brought Hadoop to the enterprise database. Believe it or not, but Todd has unassumingly and unwittingly become the enterprise champion for Hadoop.


Paco Nathan If you are looking for a Spark session in an industry event or on online resources, chances are you have may have attended one of Paco Nathan’s session. Evil mad scientist as he likes to proclaim himself, he is much more than Spark and lot of data science, maths, venture capital and learning coach among his many-many interests.

Shane Curcuru Community over code and Apache open source over corporate proprietary, Shane Curcuru has been evangelizing Apache for years now. As one of ASF directors, he ensures Apache brand name is taken care of in right measure and the community driven projects get their right share of sun.

Thought Leaders:

Ion Stoica As one of the main founders of Apache Spark, Ion Stoica already has rallied the entire data world around one product. However, his vison with Databricks does not seem to be confined to just a batch execution engine. It seems Databricks is out there to get a bigger share of data center with its cloud offerings and the innovations continue rolling in at an unprecedented velocity.

Mike Olson As the Chief Strategy Officer and Chairperson of Cloudera, Mike Olson has made sure Cloudera remains at the top of the Hadoop game. Resisting off the market IPO or acquisition bait and maintaining the innovation path, he has been keeping Cloudera a steady ship.  Open to disruptions like Spark and embracing partners, he has been one true leader who thinks and acts with vision and authority.

<< HadoopSphere Top Big Data Influencers of 2014 


  1. Congratulations Merv Adrian, Tony Baer, Constellation, TDWI.

  2. Monitoring and debugging of serverless computing can be tough.This is great blog. If you want to know more about this visit here Apache Hadoop Service.


Post a Comment

Popular articles

5 online tools in data visualization playground

While building up an analytics dashboard, one of the major decision points is regarding the type of charts and graphs that would provide better insight into the data. To avoid a lot of re-work later, it makes sense to try the various chart options during the requirement and design phase. It is probably a well known myth that existing tool options in any product can serve all the user requirements with just minor configuration changes. We all know and realize that code needs to be written to serve each customer’s individual needs.
To that effect, here are 5 tools that could empower your technical and business teams to decide on visualization options during the requirement phase. Listed below are online tools for you to add data and use as playground.
1)      Many Eyes: Many Eyes is a data visualization experiment by IBM Researchandthe IBM Cognos software group. This tool provides option to upload data sets and create visualizations including Scatter Plot, Tree Map, Tag/Word cloud and ge…

Data deduplication tactics with HDFS and MapReduce

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction.
Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability.
From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets.
Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus on how MapReduce and…

Hadoop's 10 in LinkedIn's 10

LinkedIn, the pioneering professional social network has turned 10 years old. One of the hallmarks of its journey has been its technical accomplishments and significant contribution to open source, particularly in the last few years. Hadoop occupies a central place in its technical environment powering some of the most used features of desktop and mobile app. As LinkedIn enters the second decade of its existence, here is a look at 10 major projects and products powered by Hadoop in its data ecosystem.
1)      Voldemort: Arguably, the most famous export of LinkedIn engineering, Voldemort is a distributed key-value storage system. Named after an antagonist in Harry Potter series and influenced by Amazon’s Dynamo DB, the wizardry in this database extends to its self healing features. Available in HA configuration, its layered, pluggable architecture implementations are being used for both read and read-write use cases.
2)      Azkaban: A batch job scheduling system with a friendly UI, Azkab…