Top 5 Big Data and Hadoop Trends in 2014

As the year 2014 bid us goodbye, let’s uncover some of the key trends that dominated the big data and hadoop arena during the year. There were some key themes that came to the fore and considering big data is dominating the technology investments now, these trends are indicative of the path that the entire information technology world is taking.

(1)    Real time was the flavor of the year:

Much has been written about real time big data analytics or rather, the lack of it in traditional MapReduce world. There were able products in the form of Apache Tez, Cloudera Impala, IBM BigSQL, Apache Drill and Pivotal HAWQ that were unleashed in 2013. And as the adage goes, 2013 was history in 2014. Apache Spark took center stage and ensured that everyone talked about near-real-time at least. Apache Storm also got its associated lime light alongside and the rest in the streaming pack within industry. Real time big data is here and it is here to get better with each product’s release.

(2)    R&D came to the fore:

Not just Google, Microsoft, IBM, SAP and the likes, many other exciting labs are coming to the party and investing huge dollars in big data R&D. Machine learning is passe as the real interest shifts to deep learning. Backed by years of research interest in artificial intelligence, neural networks and more, the R&D in deep learning has found a new zest. Large industry players like AT&T Research as well as emerging companies like Impetus Technologies continued to invest in big data warehouse research and brought in senior executives from other companies (like IBM) to ensure they research it right and develop it hard.

(3)    The big boys kept struggling:

Big data has never been a ground that has been dominated by the big boys of IT. Rather the new kids on the block, Cloudera, Hortonworks and MapR have dominated the space and continued to do so in 2014. With a billion $ IPO under Hortonworks’ belt, things have never been so good for emerging product companies in the technology sector. These new companies are here to stay and give many sleepless nights to the sales execs of established product companies.

(4)    It’s a man’s world:

Strange but true – whether you visit a conference, seminar or development shop of big data, there have hardly been any women in the arena. Call it the invisible ceiling within big data industry but with an exception of Anjul Bhambhri, it is rare to see a woman dominating the scene. Even the proportion of women developers/architects/managers seems to be abysmally low – something we hope to see corrected in 2015 as more people take up Hadoop skills.

(5)    The shift in services world:

Ah, the cream of the revenue pie – professional services and consulting. Not just the product world, the services world has shown some interesting trends in emergence of new players. Big data services unlike traditional analytics is still dominated by specialized players rather than CMMi certified IT majors. However, the lure of huge services headcount continued to deceive players like Cloudera who partnered with traditional vendors where the focus still is unfortunately on cost arbitrage by laying off experienced staff. By 2015 end, we should be able to see some new big names on the horizon. These companies do not have armies of certified professionals but rather have been establishing themselves by delivering successful big data specialized solutions from small experienced teams.

Read more »

Strata + Hadoop World for you and for me

Big Data has a big marketing problem: it’s been sold as a universal game-changer, but is perceived by some in the general public as overblown hype or a toy for the exclusive benefit of those few companies that can already afford it.
The situation hasn’t been helped by the recent revelations of widespread NSA communications monitoring, which have added to the not-unreasonable fears that always existed around mass-scale data collection.
Industry insiders see the immense progress being made on the technology front and understand the transformative power those shifts hold for society. But society, on the whole, may harbor some doubts.
It’s refreshing, then, to see that this year’s Strata + Hadoop World conference in New York (Oct. 15-17, 2014) is taking these concerns seriously and making a concerted effort to address them.
Industry ethics, data security, and privacy issues are among the main focuses of this year’s conference. Perhaps more importantly, the event, already a dependable weathervane for big data development, embodies the internal recognition that the industry, for the sake of its own growth as much as society’s, needs to make meaningful results accessible to the broader market.
From the official guest speakers list, the conference reflects the diverse present-day landscape of big data innovation and makes looking forward to a more inclusive future one of its main objectives. Corporate executives will appear alongside startup CTOs now producing the insights that make much of big enterprise data solutions possible. Both will offer their perspectives in a setting shared by journalists, historians, and university experts invited to contextualize the industry’s achievements and frame its challenges moving forward.
One of the main consensuses likely to emerge from the integrated lineup of workshops, panel discussions, presentations, and demonstrations is that the industry’s upside lies in looking inward. Even as fundamental tools like Hadoop, Cassandra, Storm, Spark/Shark, and DrillPublic, extend the broader possibilities of big data further and further, smaller, more specialized services have emerged to deliver usable insights for businesses, governments, and organizations in general. Looking ahead to the next few years, it’s that new bottom-up dynamic that holds some of the industry’s strongest promise.
As a result, a recent IDC report predicts big data, as an industry, will ride 27% compounded annual growth to reach $32.4 billion globally by 2017. According to O’Reilly statistics, data science job posts have already jumped 89% year-over-year, with data engineering openings rising by 38%. Gartner placed “advanced, pervasive, invisible analytics” at #4 on its list of top strategic IT trends for 2015, a prediction informed by the increasing ubiquity of mobile computing devices and standardization of built-in analytics within the mobile app industry. In addition, Gartner noted that the era of the smart machine is upon us, and predicted that it will be the most disruptive in the history of IT.
Strata + Hadoop 2014 is the place to understand this process and its far-reaching implications. Big tech brands were to be expected, but even at a first glance, the sheer range of industries in attendance speaks volumes. Iconic industry names from banking, manufacturing, energy , utilities, telecom have all registered for the conference, eager to make connections with the smaller software services also on display and learn what the newly stratified face of big data could mean in their respective fields.
Big data is finally beginning to incorporate the innovation that carries huge impact for the world and that carries huge impact for you and for me. So, let’s gear up to hear what the Hadoop fraternity has to say about it and how the world responds to it.
About the author:

Sundeep Sanghavi is the CEO and Co-Founder of DataRPM which is an award-winning, industry pioneer in smart machine analytics for big data.
DataRPM will be exhibiting at the Strata + Hadoop World conference this year; stop by booth P22 and get a chance to win a trip to London, including a private Sherlock Holmes tour.


Read more »

Big Data, Hadoop and Spark Training

HadoopSphere provides Apache Hadoop training that software engineers, developers and analysts need to understand and implement Big Data technology and tools.

Apache Spark - Developer Course (CSP01):

With HadoopSphere, you can start Apache Spark learning in a 2-day hands-on training course. This course teaches students how to develop real time and interactive applications using Scala and Java leveraging Apache Spark. Participants will perform hands-on sessions on Spark installed on Hadoop YARN enabled infrastructure. Further they will understand concepts and perform exercise on Spark Streaming, Spark SQL, Spark MLlib (machine learning) and GraphX (graph processing). 

Key Features:

- 2 Day Class room Training- Comprehensive Course content
- Covers all key topics in Spark- Extensive Hands-On sessions
- Expert Trainer- Practice Tests included

CSP01 Course Curriculum:

Lesson 01 - Introduction to Big Data 
Lesson 02 - The need for Apache Spark
Lesson 03 - Job execution in Spark
Lesson 04 - Programming in Spark
Lesson 05 - Spark Streaming
Lesson 06 - Spark SQL
Lesson 07 - MLlib
Lesson 08 - GraphX
Lesson 09 - Hadoop integration

Upcoming Batch:  

Mountain View, CA - April 2015

India - Bengaluru, KA - Feb 2015

Big Data and Apache Hadoop Course (CHD09):

With HadoopSphere, you can start Apache Hadoop learning in a 4-day hands-on training course. This course teaches students how to develop applications and analyze Big Data stored in Hadoop Distributed file system using custom MapReduce progams and tools like Pig and Hive. Students will perform hands-on sessions on multiple use cases from real life. Other topics covered include data ingestion using Sqoop and Flume, and using NoSQL database HBase.

Key Features:

- 4 Day Class room Training - Comprehensive Course content
- Real Life project use case - Extensive Hands-On sessions
- Expert Trainer - Practice Tests included

CHD09 Course Curriculum:

Lesson 01 - Introduction to Big Data and Hadoop
Lesson 02 - Hadoop Architecture
Lesson 03 - Hadoop Deployment
Lesson 04 - HDFS
Lesson 05 - Introduction to MapReduce
Lesson 06 - Advanced HDFS and Map Reduce
Lesson 07 - Pig
Lesson 08 - Hive
Lesson 09 - Sqoop, Flume
Lesson 10 - HBase
Lesson 11 - Zookeeper
Lesson 12 - Ecosystem and its Components


Our expert faculty has trained professionals from:
- Amdocs
- Aircom
- American Express
- Aon Hewitt
- Bain & Company
- Cognizant
- Ericsson
- Oracle
- Samsung
- Tata Consultancy Services
- Time Warner
- Wipro
Average Rating: 4.5 out of 5

Enroll Now:

Training classes currently scheduled in following countries. Send us an e-mail at scale[at] or contact us using this link.
- Canada
- UK
- Switzerland
- India
- Singapore

Request a call back:

You may also contact us for any queries or if you would like to discuss a custom, on-site training course.

Read more »

10 parameters for Big Data networks

Big Data and Hadoop clusters involve heavy volume of data and in many instances high velocity in bursty traffic patterns. With these clusters finding in-roads in enterprise data centers, the network designers have a few more requirements to take care. Listed below are 10 parameters to evaluate while designing a network for Big Data and Hadoop cluster.

10) Available and resilient

- Allows network designs with multiple redundant paths between the data nodes than having one or two points of failure.
- Supports upgrades without any disruption to the data nodes

9) Predictable

- Right sizing the network configuration (1GbE/10GbE/100GbE switch capacity) to achieve predictable latency in network
- real time latency may not be required for batch processing

8) Holistic network

- one network can support all workloads : Hadoop, NoSQL, Warehouse, ETL, Web
- support Hadoop and existing storage systems like DAS, SAN, or NAS

7) Multitenant

- be able to consolidate and centralize Big Data projects
- have capability to leverage the fabric across multiple use cases

6) Network partitioning

- support separate Big Data infrastructure from other IT resources on the network
- support privacy and regulatory norms

5) Scale Out

- provide seamless transition as projects increase in size and number
- accommodate new traffic patterns and larger, more complex workloads

4) Converged/ unified fabric network

-  target a flatter and converged network with Big Data as an additional configurable workload
- provide virtual chassis architecture with provision to logically manage access to multiple switches as a single device

3) Network intelligence

- carry any-to-any traffic flows of a Big Data as well as traditional cluster over an Ethernet connection
-  manage single network fabric irrespective of data requirements or storage design

2) Enough bandwidth for data node network

- provision data nodes with enough bandwidth for efficient job completion
- do cost/benefit trade-off on increasing data node uplinks

1) Support bursty traffic

- support loading files into HDFS which triggers replication of data blocks or writing mapper output files and lead to higher network use in a short period of time causing bursts of traffic in the network.
- provide optimal buffering in network devices to absorb bursts

Read more »