Skip to main content

The compelling story of data visualization



In this post, let’s read out a business problem differently. Let’s read out the data visualization problem in a text story board format. This post takes some liberty to take excerpts from blogs and web sites to create a compelling case for data visualization, the skills around it and the tools that help you attain incredible business insights.

Chester Liu[1]:  “When it comes to the topic of Big Data I have to make a public admission. I have a split personality. On the one hand the geek in me, from years spent as a software engineer, relishes the challenge of installing my own Hadoop cluster, writing MapReduce algorithms, and running all sorts of performance tests to see for myself how amazing the technology is. On the other hand, as a pragmatic product marketing manager …, I just want to get stuff done and understand my data ASAP, without writing a single line of code.”

Peter Wayner[2]“Understanding the data and finding the right question to ask is often much more complicated than getting your Hadoop job to run quickly. That's really saying something because these tools are only half of the job.”

Ben Werther[3]: “Imagine what is possible. Raw data of any kind or type lands in Hadoop with no friction. Everyday business users can interactively explore, visualize and analyze any of that data immediately, with no waiting for an IT project. One question can lead to the next and take them anywhere through the data. And the connective tissue that makes this possible — bridging between lumbering batch-processing Hadoop and this interactive experience — are ‘software defined’ scale-out in-memory data marts that automatically evolve with users questions and interest...”

Peter Wayner[2]“Many of the big data tools are also working with NoSQL data stores. These are more flexible than traditional relational databases, but the flexibility isn't as much of a departure from the past as Hadoop. NoSQL queries can be simpler because the database design discourages the complicated tabular structure that drives the complexity of working with SQL...”


Stefan Groschupf[4] : “With Hadoop, there was no limitation of storage and compute anymore and we felt that machine power could be used to overcome the slow, cumbersome, manual processes like ETL or data modeling that always gets in the way of finding insights.”


For instance “Platfora’s core concept is probably the “lens”, which is a snowflake-schema mini data mart materialized onto Platfora’s servers via a Hadoop job. A lens is meant to be used in-memory but can certainly be large enough to spill onto disk, which is why I call Platfora’s data store “memory-centric ” rather than “in-memory”. A lens is a lot like a materialized view, including in that it’s incrementally maintained…
  • You have data in Hadoop.
  • You extract data into a memory-centric data store.
  • You can do drilldowns back into Hadoop.
  • …through a browser, thanks to HTML5.*” asserts Curt Monash[5]

Infoactive.us[6] says:  The cool thing about exploring data is that you can play with numbers and learn by piecing together different variables. Don’t trap your data in a static image. Let it loose and help others find their stories in the numbers. Make your data fun.”
Andy Cotgreave [7] : “…Each of these visualizations have common themes:
  • They were designed to create change… Whether it's improving Sales or finding cures for diseases, visualizations help people make decisions based on data.
  • The people who created these visualizations were passionate. In order to make change, you need passion too.
  • The visualization is not the whole story. A visualization itself can not stand alone. The change achieved by these visualizations came about because their designers went out and pushed their views, supported by these visualizations. If you want to make change, your visualization also needs to be promoted by you.”

Qunb[8] : It’s not just about big data. It’s also about the incredible details.”

Peter Wayner[2]: “These all bear investigation, but the software is the least of the challenges.”

Susan Puccinelli[9]:  “There’s really no excuse to not get started working with whatever data you have today.”


[1] Chester Liu in Who, Me? I’m Big Data? (Part 1 in the “Body of Big Data” Series)

[2] Peter Wayner in 7 top tools for taming big data

[3] Ben Wether in The End of the Data Warehouse

[5] Curt Monash in Introduction to Platfora

[6] http://infoactive.co/about/

[7] Andy Cotgreave in What are the top 5 visualizations of all time?

[8] http://www.qunb.com/about.html

[9] Susan Puccinelli in Data Analytics in PR – Showing My Work


Comments

Popular posts from this blog

Data deduplication tactics with HDFS and MapReduce

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction.
Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability.
From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets.
Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus on how MapReduce and…

Large scale graph processing with Apache Hama

Recently Apache Hama team released official 0.7.0 version. According to the release announcement, there were big improvements in Graph package. In this article, we provide an overview of the newly improved Graph package of Apache Hama, and the benchmark results that performed by cloud platform team at Samsung Electronics.

Large scale datasets are being increasingly used in many fields. Graph algorithms are becoming important for analyzing big data. Data scientists are able to predict the behavior of the customer, the trends of the market, and make a decision by analyzing the graph structure and characteristics. Currently there are a variety of open source graph analytic frameworks, such as Google’s Pregel[1], Apache Giraph[2], GraphLab[3] and GraphX[4]. These frameworks are aimed at computations varying from classical graph traversal algorithms to graph statistics calculations such as triangle counting to complex machine learning algorithms. However these frameworks have been developed…

In-memory data model with Apache Gora

Open source in-memory data model and persistence for big data framework Apache Gora™ version 0.3, was released in May 2013. The 0.3 release offers significant improvements and changes to a number of modules including a number of bug fixes. However, what may be of significant interest to the DynamoDB community will be the addition of a gora-dynamodb datastore for mapping and persisting objects to Amazon's DynamoDB. Additionally the release includes various improvements to the gora-core and gora-cassandra modules as well as a new Web Services API implementation which enables users to extend Gora to any cloud storage platform of their choice. This 2-part post provides commentary on all of the above and a whole lot more, expanding to cover where Gora fits in within the NoSQL and Big Data space, the development challenges and features which have been baked into Gora 0.3 and finally what we have on the road map for the 0.4 development drive.
Introducing Apache Gora Although there are var…