Skip to main content

A future where storage and processing costs nothing

A statement from Jeff Hammerbacher’s presentation at SINA Innovations re-induced a line of thought…about the future of data storage and processing. 
Many years back, no one had thought that sending messages could be free. Enter e-mail and the likes of Hotmail – we suddenly had a free way to send messages and a way which was embarrassingly faster compared to what postal mail could achieve. The technology existed for long but was not open to the mass consumer. Among the various technologies that propped up the internet revolution, e-mail occupies a significant king maker role.
Let’s pan out now to 2015 (3 years from now) – could we have a Cloud pioneer who will offer free data storage with quality of service and may be charge little or nothing for processing? By that time, it is desired that processing and compression would have also improved exponentially so that Hadoop would be running mammoth jobs in a few minutes on multi-cores with leap and bound improved versions of Impala and Infosphere of the software ecosystem.

Could that be a possibility?

Let us look at business model prospects:
* Expense (big items only listed below):
-          Massive storage hardware
-          Immense computation power
-          Staff
-          Location related expense
-          Marketing
* Revenue from:
-          Paid SaaS for custom needs
-          Advertising Campaigns
-          Consulting and support

The obvious question looking at the revenue section comes would these be sufficient to factor in the costs?

Let us critically first analyze the three revenue streams:
-          SaaS – there is a mixed opinion on success of SaaS yet in terms of revenue pie. While Elastic MapReduce claims to have launched 3.7 million clusters since 2010, there are many other software which still browbeat about the traditional packaging licensing model. Many point to the free tiers and university packages also among these cluster numbers. The unrelated success of Google Docs, Prezi etc. also point to the subscription success but with limited $ returns. So, is SaaS really worth the buck? More thought and debate needed.
-          Advertising – there is a certain level of skepticism about pay per click model and pay per impression. So while MapR may be hogging the background image on GigaOm posts and NetApp on Adwords, does it make them a darling of the Big Data followers? Do we need to evolve to a better lead generation mechanism especially with this niche audience? Points to a need of Analytics for analytics - which could very well mean analyzing MapReduce job data to suggest offers to IT decision makers.
-          Consulting – this is bound to be a major buck earner. However, the doubt comes in with low labor cost arbitrage. While the Big Data strategy consulting for now is still highly expensive, the technology consulting, implementation and support are bound to hit the labor rate trough by end of 2013. So, is it the volume of consultants or variety of skills or velocity of delivery that you would prefer for your absolute quality delivery?

With this critique, it is obvious there are 3 basic innovations required at this stage to achieve the key margins in our envisioned free storage and processing world:
1-      Drastic reduction in cost of hardware – especially storage and processors
2-      Innovation in business models. Let us elaborate on this point futher.
a.       Most of the enterprise customers today want results from their analytics projects. They don’t want to build up a million $ warehouse which by the time is built is already outdated with newer systems and newer business requirements. So, could there be an era of revenue sharing Big Data projects or other $ earning agile proposals?
b.      The enterprise deals major portion of revenue comes from software and services. Hardware occupies a decent chunk but also acts as a road blocker more than once with cost, lead time and depreciating asset cost. Admitted mostly, many enterprise unless compelled by regulatory and security factors would be more than happy to not own the hardware. So, the business model problem statement is how do we optimize the cluster ownership cost for a customer?
3-      This brings us to third vital factor of security and governance. Many big enterprise would not like to use a public or virtual private cloud. Till the time there are strong showcase of avant grade security implemented in cloud environments, those concerns would persist.

The nay-sayers are bound to point out that there is no free lunch. The optimists are pointing out at hard work done to bring us to a level at which we stand today. Lots of debating questions, lots of mixed opinions – but would the answer to all these be a visionary entrepreneur who can deliver on this challenge? Is anyone game for this challenge?

Image courtesy of


  1. Could be a game changer for sure...
    Hardware has been China stronghold for long - may be an entrepreneur from there ? or may be the strong EMC, IBM labs over there leading the way?


Post a Comment

Popular posts from this blog

Beyond NSA, the intelligence community has a big technology footprint

While all through the past few days the focus has been on NSA activities, the discussion has often veered around the technologies and products used by NSA. At the same time, a side discussion topic has been the larger technical ecosystem of intelligence units. CIA has been one of the more prolific users of Information Technology by its own admission. To that extent, CIA spinned off a venture capital firm In-Q-Tel in 1999 to invest in focused sector companies. Per Helen Coster of Fortune Magazine, In-Q-Tel (IQT) has been named “after the gadget-toting James Bond character Q”.
In-Q-Tel states on its website that “We design our strategic investments to accelerate product development and delivery for this ready-soon innovation, and specifically to help companies add capabilities needed by our customers in the Intelligence Community”. To that effect, it has made over 200 investments in early stage companies for propping up products. Being a not-for-profit group, unlike Private Venture capi…

Data deduplication tactics with HDFS and MapReduce

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction.
Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability.
From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets.
Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus on how MapReduce and…

Top Big Data Influencers of 2015

2015 was an exciting year for big data and hadoop ecosystem. We saw hadoop becoming an essential part of data management strategy of almost all major enterprise organizations. There is cut throat competition among IT vendors now to help realize the vision of data hub, data lake and data warehouse with Hadoop and Spark.
As part of its annual assessment of big data and hadoop ecosystem, HadoopSphere publishes a list of top big data influencers each year. The list is derived based on a scientific methodology which involves assessing various parameters in each category of influencers. HadoopSphere Top Big Data Influencers list reflects the people, products, organizations and portals that exercised the most influence on big data and ecosystem in a particular year. The influencers have been listed in the following categories:

AnalystsSocial MediaOnline MediaProductsTechiesCoachThought LeadersClick here to read the methodology used.

Analysts:Doug HenschenIt might have been hard to miss Doug…