Skip to main content

Top Big Data influencers of 2014

Big Data is an exciting technology space innovating at a pace probably never seen before. With a dynamic ecosystem and scorching pace of product development, it is easy to be left behind. However, thanks to visionaries in this ecosystem who have been able to decode the maze and set things right for us, we have been seeing successful big data use cases and implementations. 

HadoopSphere presents below its annual list of top big data influencers. This list reflects the people, products, organizations and portals that exercised the most influence on big data and ecosystem in a particular year. The influencers have been listed in the following categories:

  • Analysts
  • Online Media
  • Products
  • Social Media
  • Angels
  • Thought Leaders
The info-graphic below shows the Top Big Data influencers of 2014 as ranked by HadoopSphere. 


Mike Gualtieri In 2013, Mike Gualtieri of Forrester had predicted that big data will be the Time person of the year. Well, it almost came true with a big data use (or misuse) case (Edward Snowden) making it to the runner up of Time person of the year. Besides occasionally playing sorcerer, Mike has remained one of the most well respected analyst in year 2014 commanding a comprehensive vison and view for the data ecosystem.

Curt Monash If you have not been reading Curt Monash, you may have been living on an island probably. And, if you got a few incisive comments on your product, well, then you are probably part of an urban elite in this big data city. Don’t expect courtesies, just expect plain honest assessment and that too with technical depth from Curt.

Tony Baer Consistency and clarity are the forte of Tony Baer, Ovum’s principal analyst. Presume a consistent sane advice with clear cut guidance on what to expect and what not to expect from Tony. He has remained a top influencer in big data and Hadoop ecosystem consistently for another year.

Online Media:

TDWI With research papers, blogs, webinars and education events, TDWI continued to attract eye-balls and sponsors alike making it one of the top focused industry portals.

IBM IBM is a technology company but runs a media machinery of its own. Its data initiatives like IBM Data Magazine, IBM Big Data Hub, Big Data University, Developer Works, Red books combined together continued to be among top traffic getters. Though the content may be in part IBM specific, overall it did a great work of educating big data community.

DZone With ‘smart content’ for big data professionals, DZone continued to encourage community to contribute links, articles, guides and ‘refcardz’. DZone ensured both quality and good volume traffic resulting in a high influence on techies.


Spark Do we need to say anything about this obvious choice? Apache Spark has been the flavor of all seasons since 2014 beginning. With biggest open source community in big data ecosystem, it continued to define and influence the shape of future products.

Scala Although technically a programming language and not a product, Scala is listed here as it marches its way ahead to become a preferred language for big data programming. With both Apache Spark and Flink promoting it big time, the simplicity and power of the language became more obvious. We expect Scala to become one of the most powerful languages in few years.

Kafka If you need to quote an example of word-of-mouth success, here it is. Apache Kafka was developed at LinkedIn and was not a part of major Hadoop distributions till early 2015. However, still it has emerged as a preferred choice for data ingestion and has seen adoption by internet companies, financial majors and travel portals among others.

Social Media:

Ben Lorica If one has a dream twitter handle like ‘bigdata’, it may not be sheer co-incidence. It probably shows the handle owner has been talking about big data before we heard of it. Ben Lorica is the Chief Data Scientist and Director of Content Strategy for Data at O'Reilly Media, Inc and commands the ‘bigdata’ twitter handle with its impressive following.

Gregory Piatetsky-Shapiro As the President of KDnuggets, which provides analytics and data mining consulting, Gregory is a founder of KDD (Knowledge Discovery and Data mining conferences). He is one of the leading social influencers with his mentions generating huge follower interest.

Kirk Borne Dr.Kirk Borne is Professor of Astrophysics and Computational Science at George Mason University. As a data scientist and astrophysicist, he mostly talks about big data on social media and continues to attract huge follower base.

Angel Investors:

Naval Ravikant Entrepreneur and an angel investor, Naval Ravikant is co-founder of AngelList. Through this terrific forum and other offline activities, he has been drumming up the cause of many startups and taking them through the funding gates.

Data Collective DCVC (aka Data Collective) is a seed and early stage venture capital fund that invests in big data companies. Its extended team consists of more than 35 “Equity Partners,” who are notable technical founders and executives, data scientists and engineers. Some of the notable portfolio companies include Blue Data, Continuity, Elasticsearch, Citus Data.

Thought Leaders:

Mike Olson As the Chief Strategy Officer of Cloudera, Mike Olson is a leader whose vision has been driving his company and much of the Hadoop ecosystem. His unbridled passion combined with ability to foresee market dynamics makes him one of the biggest thought leaders and influencers in entire information technology arena. From marketing Hadoop to touting Impala as MPP or mentoring competitive Spark, Mike has exhibited unparalleled transformational leadership characteristics.

Merv Adrian Merv Adrian is Research VP at Gartner and the more known face of the research company in social media and event circles. Each year Gartner somehow manages to hit a rough note with the big data vendors, be it “trough of disillusionment” or “data lake fallacy” comment. However, Merv with his astute knowledge of Hadoop ecosystem, BI world and technology lifecycles has made people understand the discordant notes to apply caution, restrain and intelligence beyond the obvious hype. And, that’s what thought leaders do – create sense and path out of chaos and conflicts. Pro Tip: Merv may not agree with you but will still have you and him understand a common path.

<< Top big data influencers of 2013


  1. It is not easy listing the influencers in so many diverse areas. This list for 2014 and the one you published for 2013 remains one of the most diverse and well thought of list. Actually, amazed by the wide variety of sources from which you have extracted data to arrive at this list.
    Great stuff.


Post a Comment

Popular articles

5 online tools in data visualization playground

While building up an analytics dashboard, one of the major decision points is regarding the type of charts and graphs that would provide better insight into the data. To avoid a lot of re-work later, it makes sense to try the various chart options during the requirement and design phase. It is probably a well known myth that existing tool options in any product can serve all the user requirements with just minor configuration changes. We all know and realize that code needs to be written to serve each customer’s individual needs. To that effect, here are 5 tools that could empower your technical and business teams to decide on visualization options during the requirement phase. Listed below are online tools for you to add data and use as playground. 1)      Many Eyes : Many Eyes is a data visualization experiment by IBM Research and the IBM Cognos software group. This tool provides option to upload data sets and create visualizations including Scatter Plot, Tree Ma

Data deduplication tactics with HDFS and MapReduce

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction. Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability. From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets. Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus o

In-memory data model with Apache Gora

Open source in-memory data model and persistence for big data framework Apache Gora™ version 0.3, was released in May 2013. The 0.3 release offers significant improvements and changes to a number of modules including a number of bug fixes. However, what may be of significant interest to the DynamoDB community will be the addition of a gora-dynamodb datastore for mapping and persisting objects to Amazon's DynamoDB . Additionally the release includes various improvements to the gora-core and gora-cassandra modules as well as a new Web Services API implementation which enables users to extend Gora to any cloud storage platform of their choice. This 2-part post provides commentary on all of the above and a whole lot more, expanding to cover where Gora fits in within the NoSQL and Big Data space, the development challenges and features which have been baked into Gora 0.3 and finally what we have on the road map for the 0.4 development drive. Introducing Apache Gora Although