Skip to main content

The compelling story of data visualization

In this post, let’s read out a business problem differently. Let’s read out the data visualization problem in a text story board format. This post takes some liberty to take excerpts from blogs and web sites to create a compelling case for data visualization, the skills around it and the tools that help you attain incredible business insights.

Chester Liu[1]:  “When it comes to the topic of Big Data I have to make a public admission. I have a split personality. On the one hand the geek in me, from years spent as a software engineer, relishes the challenge of installing my own Hadoop cluster, writing MapReduce algorithms, and running all sorts of performance tests to see for myself how amazing the technology is. On the other hand, as a pragmatic product marketing manager …, I just want to get stuff done and understand my data ASAP, without writing a single line of code.”

Peter Wayner[2]“Understanding the data and finding the right question to ask is often much more complicated than getting your Hadoop job to run quickly. That's really saying something because these tools are only half of the job.”

Ben Werther[3]: “Imagine what is possible. Raw data of any kind or type lands in Hadoop with no friction. Everyday business users can interactively explore, visualize and analyze any of that data immediately, with no waiting for an IT project. One question can lead to the next and take them anywhere through the data. And the connective tissue that makes this possible — bridging between lumbering batch-processing Hadoop and this interactive experience — are ‘software defined’ scale-out in-memory data marts that automatically evolve with users questions and interest...”

Peter Wayner[2]“Many of the big data tools are also working with NoSQL data stores. These are more flexible than traditional relational databases, but the flexibility isn't as much of a departure from the past as Hadoop. NoSQL queries can be simpler because the database design discourages the complicated tabular structure that drives the complexity of working with SQL...”

Stefan Groschupf[4] : “With Hadoop, there was no limitation of storage and compute anymore and we felt that machine power could be used to overcome the slow, cumbersome, manual processes like ETL or data modeling that always gets in the way of finding insights.”

For instance “Platfora’s core concept is probably the “lens”, which is a snowflake-schema mini data mart materialized onto Platfora’s servers via a Hadoop job. A lens is meant to be used in-memory but can certainly be large enough to spill onto disk, which is why I call Platfora’s data store “memory-centric ” rather than “in-memory”. A lens is a lot like a materialized view, including in that it’s incrementally maintained…
  • You have data in Hadoop.
  • You extract data into a memory-centric data store.
  • You can do drilldowns back into Hadoop.
  • …through a browser, thanks to HTML5.*” asserts Curt Monash[5][6] says:  The cool thing about exploring data is that you can play with numbers and learn by piecing together different variables. Don’t trap your data in a static image. Let it loose and help others find their stories in the numbers. Make your data fun.”
Andy Cotgreave [7] : “…Each of these visualizations have common themes:
  • They were designed to create change… Whether it's improving Sales or finding cures for diseases, visualizations help people make decisions based on data.
  • The people who created these visualizations were passionate. In order to make change, you need passion too.
  • The visualization is not the whole story. A visualization itself can not stand alone. The change achieved by these visualizations came about because their designers went out and pushed their views, supported by these visualizations. If you want to make change, your visualization also needs to be promoted by you.”

Qunb[8] : It’s not just about big data. It’s also about the incredible details.”

Peter Wayner[2]: “These all bear investigation, but the software is the least of the challenges.”

Susan Puccinelli[9]:  “There’s really no excuse to not get started working with whatever data you have today.”

[1] Chester Liu in Who, Me? I’m Big Data? (Part 1 in the “Body of Big Data” Series)

[2] Peter Wayner in 7 top tools for taming big data

[3] Ben Wether in The End of the Data Warehouse

[5] Curt Monash in Introduction to Platfora


[7] Andy Cotgreave in What are the top 5 visualizations of all time?


[9] Susan Puccinelli in Data Analytics in PR – Showing My Work


Popular posts from this blog

In-memory data model with Apache Gora

Open source in-memory data model and persistence for big data framework Apache Gora™ version 0.3, was released in May 2013. The 0.3 release offers significant improvements and changes to a number of modules including a number of bug fixes. However, what may be of significant interest to the DynamoDB community will be the addition of a gora-dynamodb datastore for mapping and persisting objects to Amazon's DynamoDB. Additionally the release includes various improvements to the gora-core and gora-cassandra modules as well as a new Web Services API implementation which enables users to extend Gora to any cloud storage platform of their choice. This 2-part post provides commentary on all of the above and a whole lot more, expanding to cover where Gora fits in within the NoSQL and Big Data space, the development challenges and features which have been baked into Gora 0.3 and finally what we have on the road map for the 0.4 development drive.
Introducing Apache Gora Although there are var…

Data deduplication tactics with HDFS and MapReduce

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction.
Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability.
From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets.
Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus on how MapReduce and…

Amazon DynamoDB datastore for Gora

What was initially suggested during causal conversation at ApacheCon2011 in November 2011 as a “neat idea”, would soon become prime ground for Gora's first taste of participation within Google's Summer of Code program. Initially, the project, titled Amazon DynamoDB datastore for Gora, merely aimed to extend the Gora framework to Amazon DynamoDB. However, it seem became obvious that the issue would include much more than that simple vision.

The Gora 0.3 Toolbox We briefly digress to discuss some other noticeable additions to Gora in 0.3, namely: Modification of the Query interface: The Query interface was amended from Query<K, T> to Query<K, T extends Persistent> to be more precise and explicit for developers. Consequently all implementors and users of the Query interface can only pass object's of Persistent type. Logging improvements for data store mappings: A key aspect of using Gora well is the establishment and accurate definitio…