Skip to main content

The compelling story of data visualization

In this post, let’s read out a business problem differently. Let’s read out the data visualization problem in a text story board format. This post takes some liberty to take excerpts from blogs and web sites to create a compelling case for data visualization, the skills around it and the tools that help you attain incredible business insights.

Chester Liu[1]:  “When it comes to the topic of Big Data I have to make a public admission. I have a split personality. On the one hand the geek in me, from years spent as a software engineer, relishes the challenge of installing my own Hadoop cluster, writing MapReduce algorithms, and running all sorts of performance tests to see for myself how amazing the technology is. On the other hand, as a pragmatic product marketing manager …, I just want to get stuff done and understand my data ASAP, without writing a single line of code.”

Peter Wayner[2]“Understanding the data and finding the right question to ask is often much more complicated than getting your Hadoop job to run quickly. That's really saying something because these tools are only half of the job.”

Ben Werther[3]: “Imagine what is possible. Raw data of any kind or type lands in Hadoop with no friction. Everyday business users can interactively explore, visualize and analyze any of that data immediately, with no waiting for an IT project. One question can lead to the next and take them anywhere through the data. And the connective tissue that makes this possible — bridging between lumbering batch-processing Hadoop and this interactive experience — are ‘software defined’ scale-out in-memory data marts that automatically evolve with users questions and interest...”

Peter Wayner[2]“Many of the big data tools are also working with NoSQL data stores. These are more flexible than traditional relational databases, but the flexibility isn't as much of a departure from the past as Hadoop. NoSQL queries can be simpler because the database design discourages the complicated tabular structure that drives the complexity of working with SQL...”

Stefan Groschupf[4] : “With Hadoop, there was no limitation of storage and compute anymore and we felt that machine power could be used to overcome the slow, cumbersome, manual processes like ETL or data modeling that always gets in the way of finding insights.”

For instance “Platfora’s core concept is probably the “lens”, which is a snowflake-schema mini data mart materialized onto Platfora’s servers via a Hadoop job. A lens is meant to be used in-memory but can certainly be large enough to spill onto disk, which is why I call Platfora’s data store “memory-centric ” rather than “in-memory”. A lens is a lot like a materialized view, including in that it’s incrementally maintained…
  • You have data in Hadoop.
  • You extract data into a memory-centric data store.
  • You can do drilldowns back into Hadoop.
  • …through a browser, thanks to HTML5.*” asserts Curt Monash[5][6] says:  The cool thing about exploring data is that you can play with numbers and learn by piecing together different variables. Don’t trap your data in a static image. Let it loose and help others find their stories in the numbers. Make your data fun.”
Andy Cotgreave [7] : “…Each of these visualizations have common themes:
  • They were designed to create change… Whether it's improving Sales or finding cures for diseases, visualizations help people make decisions based on data.
  • The people who created these visualizations were passionate. In order to make change, you need passion too.
  • The visualization is not the whole story. A visualization itself can not stand alone. The change achieved by these visualizations came about because their designers went out and pushed their views, supported by these visualizations. If you want to make change, your visualization also needs to be promoted by you.”

Qunb[8] : It’s not just about big data. It’s also about the incredible details.”

Peter Wayner[2]: “These all bear investigation, but the software is the least of the challenges.”

Susan Puccinelli[9]:  “There’s really no excuse to not get started working with whatever data you have today.”

[1] Chester Liu in Who, Me? I’m Big Data? (Part 1 in the “Body of Big Data” Series)

[2] Peter Wayner in 7 top tools for taming big data

[3] Ben Wether in The End of the Data Warehouse

[5] Curt Monash in Introduction to Platfora


[7] Andy Cotgreave in What are the top 5 visualizations of all time?


[9] Susan Puccinelli in Data Analytics in PR – Showing My Work


Popular posts from this blog

Offloading legacy with Hadoop

With most Fortune 500 organizations having invested in mainframes and other workload systems in the past, the rise of Big Data platforms poses newer integration challenges. The data integration and ETL players are finding fresh opportunities to solve business and IT problems within the Hadoop ecosystem.
To understand the context, challenges and opportunities, we asked a few questions to Syncsort CEO Lonne Jaffe. Syncsort provides fast, secure, enterprise-grade software spanning Big Data in Apache Hadoop to Big Iron on mainframes. At Syncsort, Lonne Jaffe is focusing on accelerating the growth of the company's high-performance Big Data offerings, both organically and through acquisition.
From mainframes to Hadoop and other platforms, Syncsort seems to have been evolving itself continuously. Where do you see Syncsort heading further?Lonne Jaffe: Syncsort is extraordinary in its ability to continuously reinvent itself. Today, we’re innovating around Apache Hadoop and other Big Data pla…

Data deduplication tactics with HDFS and MapReduce

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction.
Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability.
From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets.
Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus on how MapReduce and…

Top Big Data Influencers of 2015

2015 was an exciting year for big data and hadoop ecosystem. We saw hadoop becoming an essential part of data management strategy of almost all major enterprise organizations. There is cut throat competition among IT vendors now to help realize the vision of data hub, data lake and data warehouse with Hadoop and Spark.
As part of its annual assessment of big data and hadoop ecosystem, HadoopSphere publishes a list of top big data influencers each year. The list is derived based on a scientific methodology which involves assessing various parameters in each category of influencers. HadoopSphere Top Big Data Influencers list reflects the people, products, organizations and portals that exercised the most influence on big data and ecosystem in a particular year. The influencers have been listed in the following categories:

AnalystsSocial MediaOnline MediaProductsTechiesCoachThought LeadersClick here to read the methodology used.

Analysts:Doug HenschenIt might have been hard to miss Doug…