Skip to main content

A future where storage and processing costs nothing

A statement from Jeff Hammerbacher’s presentation at SINA Innovations re-induced a line of thought…about the future of data storage and processing. 
Many years back, no one had thought that sending messages could be free. Enter e-mail and the likes of Hotmail – we suddenly had a free way to send messages and a way which was embarrassingly faster compared to what postal mail could achieve. The technology existed for long but was not open to the mass consumer. Among the various technologies that propped up the internet revolution, e-mail occupies a significant king maker role.
Let’s pan out now to 2015 (3 years from now) – could we have a Cloud pioneer who will offer free data storage with quality of service and may be charge little or nothing for processing? By that time, it is desired that processing and compression would have also improved exponentially so that Hadoop would be running mammoth jobs in a few minutes on multi-cores with leap and bound improved versions of Impala and Infosphere of the software ecosystem.

Could that be a possibility?

Let us look at business model prospects:
* Expense (big items only listed below):
-          Massive storage hardware
-          Immense computation power
-          Staff
-          Location related expense
-          Marketing
* Revenue from:
-          Paid SaaS for custom needs
-          Advertising Campaigns
-          Consulting and support

The obvious question looking at the revenue section comes would these be sufficient to factor in the costs?

Let us critically first analyze the three revenue streams:
-          SaaS – there is a mixed opinion on success of SaaS yet in terms of revenue pie. While Elastic MapReduce claims to have launched 3.7 million clusters since 2010, there are many other software which still browbeat about the traditional packaging licensing model. Many point to the free tiers and university packages also among these cluster numbers. The unrelated success of Google Docs, Prezi etc. also point to the subscription success but with limited $ returns. So, is SaaS really worth the buck? More thought and debate needed.
-          Advertising – there is a certain level of skepticism about pay per click model and pay per impression. So while MapR may be hogging the background image on GigaOm posts and NetApp on Adwords, does it make them a darling of the Big Data followers? Do we need to evolve to a better lead generation mechanism especially with this niche audience? Points to a need of Analytics for analytics - which could very well mean analyzing MapReduce job data to suggest offers to IT decision makers.
-          Consulting – this is bound to be a major buck earner. However, the doubt comes in with low labor cost arbitrage. While the Big Data strategy consulting for now is still highly expensive, the technology consulting, implementation and support are bound to hit the labor rate trough by end of 2013. So, is it the volume of consultants or variety of skills or velocity of delivery that you would prefer for your absolute quality delivery?

With this critique, it is obvious there are 3 basic innovations required at this stage to achieve the key margins in our envisioned free storage and processing world:
1-      Drastic reduction in cost of hardware – especially storage and processors
2-      Innovation in business models. Let us elaborate on this point futher.
a.       Most of the enterprise customers today want results from their analytics projects. They don’t want to build up a million $ warehouse which by the time is built is already outdated with newer systems and newer business requirements. So, could there be an era of revenue sharing Big Data projects or other $ earning agile proposals?
b.      The enterprise deals major portion of revenue comes from software and services. Hardware occupies a decent chunk but also acts as a road blocker more than once with cost, lead time and depreciating asset cost. Admitted mostly, many enterprise unless compelled by regulatory and security factors would be more than happy to not own the hardware. So, the business model problem statement is how do we optimize the cluster ownership cost for a customer?
3-      This brings us to third vital factor of security and governance. Many big enterprise would not like to use a public or virtual private cloud. Till the time there are strong showcase of avant grade security implemented in cloud environments, those concerns would persist.

The nay-sayers are bound to point out that there is no free lunch. The optimists are pointing out at hard work done to bring us to a level at which we stand today. Lots of debating questions, lots of mixed opinions – but would the answer to all these be a visionary entrepreneur who can deliver on this challenge? Is anyone game for this challenge?

Image courtesy of


  1. Could be a game changer for sure...
    Hardware has been China stronghold for long - may be an entrepreneur from there ? or may be the strong EMC, IBM labs over there leading the way?


Post a Comment

Popular articles

5 online tools in data visualization playground

While building up an analytics dashboard, one of the major decision points is regarding the type of charts and graphs that would provide better insight into the data. To avoid a lot of re-work later, it makes sense to try the various chart options during the requirement and design phase. It is probably a well known myth that existing tool options in any product can serve all the user requirements with just minor configuration changes. We all know and realize that code needs to be written to serve each customer’s individual needs. To that effect, here are 5 tools that could empower your technical and business teams to decide on visualization options during the requirement phase. Listed below are online tools for you to add data and use as playground. 1)      Many Eyes : Many Eyes is a data visualization experiment by IBM Research and the IBM Cognos software group. This tool provides option to upload data sets and create visualizations including Scatter Plot, Tree Ma

Data deduplication tactics with HDFS and MapReduce

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction. Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability. From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets. Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus o

In-memory data model with Apache Gora

Open source in-memory data model and persistence for big data framework Apache Gora™ version 0.3, was released in May 2013. The 0.3 release offers significant improvements and changes to a number of modules including a number of bug fixes. However, what may be of significant interest to the DynamoDB community will be the addition of a gora-dynamodb datastore for mapping and persisting objects to Amazon's DynamoDB . Additionally the release includes various improvements to the gora-core and gora-cassandra modules as well as a new Web Services API implementation which enables users to extend Gora to any cloud storage platform of their choice. This 2-part post provides commentary on all of the above and a whole lot more, expanding to cover where Gora fits in within the NoSQL and Big Data space, the development challenges and features which have been baked into Gora 0.3 and finally what we have on the road map for the 0.4 development drive. Introducing Apache Gora Although