Strata + Hadoop World for you and for me

Big Data has a big marketing problem: it’s been sold as a universal game-changer, but is perceived by some in the general public as overblown hype or a toy for the exclusive benefit of those few companies that can already afford it.
The situation hasn’t been helped by the recent revelations of widespread NSA communications monitoring, which have added to the not-unreasonable fears that always existed around mass-scale data collection.
Industry insiders see the immense progress being made on the technology front and understand the transformative power those shifts hold for society. But society, on the whole, may harbor some doubts.
It’s refreshing, then, to see that this year’s Strata + Hadoop World conference in New York (Oct. 15-17, 2014) is taking these concerns seriously and making a concerted effort to address them.
Industry ethics, data security, and privacy issues are among the main focuses of this year’s conference. Perhaps more importantly, the event, already a dependable weathervane for big data development, embodies the internal recognition that the industry, for the sake of its own growth as much as society’s, needs to make meaningful results accessible to the broader market.
From the official guest speakers list, the conference reflects the diverse present-day landscape of big data innovation and makes looking forward to a more inclusive future one of its main objectives. Corporate executives will appear alongside startup CTOs now producing the insights that make much of big enterprise data solutions possible. Both will offer their perspectives in a setting shared by journalists, historians, and university experts invited to contextualize the industry’s achievements and frame its challenges moving forward.
One of the main consensuses likely to emerge from the integrated lineup of workshops, panel discussions, presentations, and demonstrations is that the industry’s upside lies in looking inward. Even as fundamental tools like Hadoop, Cassandra, Storm, Spark/Shark, and DrillPublic, extend the broader possibilities of big data further and further, smaller, more specialized services have emerged to deliver usable insights for businesses, governments, and organizations in general. Looking ahead to the next few years, it’s that new bottom-up dynamic that holds some of the industry’s strongest promise.
As a result, a recent IDC report predicts big data, as an industry, will ride 27% compounded annual growth to reach $32.4 billion globally by 2017. According to O’Reilly statistics, data science job posts have already jumped 89% year-over-year, with data engineering openings rising by 38%. Gartner placed “advanced, pervasive, invisible analytics” at #4 on its list of top strategic IT trends for 2015, a prediction informed by the increasing ubiquity of mobile computing devices and standardization of built-in analytics within the mobile app industry. In addition, Gartner noted that the era of the smart machine is upon us, and predicted that it will be the most disruptive in the history of IT.
Strata + Hadoop 2014 is the place to understand this process and its far-reaching implications. Big tech brands were to be expected, but even at a first glance, the sheer range of industries in attendance speaks volumes. Iconic industry names from banking, manufacturing, energy , utilities, telecom have all registered for the conference, eager to make connections with the smaller software services also on display and learn what the newly stratified face of big data could mean in their respective fields.
Big data is finally beginning to incorporate the innovation that carries huge impact for the world and that carries huge impact for you and for me. So, let’s gear up to hear what the Hadoop fraternity has to say about it and how the world responds to it.
About the author:

Sundeep Sanghavi is the CEO and Co-Founder of DataRPM which is an award-winning, industry pioneer in smart machine analytics for big data.
DataRPM will be exhibiting at the Strata + Hadoop World conference this year; stop by booth P22 and get a chance to win a trip to London, including a private Sherlock Holmes tour.


Read more »

Introducing Hadoop FlipBooks

In line with the learning theme that HadoopSphere has been evangelizing, we are pleased to introduce a new feature named FlipBooks. A Hadoop flipbook is a quick reference guide for any topic giving a short summary of key concepts in form of Q&A. Typically with a set of 4 questions, it tries to test your knowledge on the concept.

To begin with 4 flipbooks have been posted:
  • Apache Spark
  • Hadoop Security
  • HDFS
  • MapReduce
Check back each week for more flipbooks.

To access flipbooks, you may proceed to

Read more »

Big Data and Hadoop Training

Hadoopsphere provides Apache Hadoop training that software engineers, developers and analysts need to understand and implement Big Data technology and tools.

Big Data and Apache Hadoop Level 1 Course:

With HadoopSphere, you can start Apache Hadoop learning in a 4-day hands-on training course. This course teaches students how to develop applications and analyze Big Data stored in Hadoop Distributed file system using custom MapReduce progams and tools like Pig and Hive. Students will perform hands-on sessions on multiple use cases from real life. Other topics covered include data ingestion using Sqoop and Flume, and using NoSQL database HBase.

Key Features:

- 4 Day Class room Training - Comprehensive Course content
- Real Life project use case - Extensive Hands-On sessions
- Expert Trainer - Practice Tests included

Course Curriculum:

Lesson 01 - Introduction to Big Data and Hadoop
Lesson 02 - Hadoop Architecture
Lesson 03 - Hadoop Deployment
Lesson 04 - HDFS
Lesson 05 - Introduction to MapReduce
Lesson 06 - Advanced HDFS and Map Reduce
Lesson 07 - Pig
Lesson 08 - Hive
Lesson 09 - Sqoop, Flume
Lesson 10 - HBase
Lesson 11 - Zookeeper
Lesson 12 - Ecosystem and its Components


Our expert faculty has trained professionals from:
- Amdocs
- Aircom
- American Express
- Aon Hewitt
- Bain & Company
- Cognizant
- Oracle
- Samsung
- Tata Consultancy Services
- Wipro
Average Rating: 4.5 out of 5

Enroll Now:

Training classes currently scheduled in following countries. Send us an e-mail at scale[at] or contact us using this link.
- USA/Canada
- UK
- Switzerland
- India
- Singapore

Request a call back:

You may also contact us for any queries or if you would like to discuss a custom, on-site training course.

Read more »

10 parameters for Big Data networks

Big Data and Hadoop clusters involve heavy volume of data and in many instances high velocity in bursty traffic patterns. With these clusters finding in-roads in enterprise data centers, the network designers have a few more requirements to take care. Listed below are 10 parameters to evaluate while designing a network for Big Data and Hadoop cluster.

10) Available and resilient

- Allows network designs with multiple redundant paths between the data nodes than having one or two points of failure.
- Supports upgrades without any disruption to the data nodes

9) Predictable

- Right sizing the network configuration (1GbE/10GbE/100GbE switch capacity) to achieve predictable latency in network
- real time latency may not be required for batch processing

8) Holistic network

- one network can support all workloads : Hadoop, NoSQL, Warehouse, ETL, Web
- support Hadoop and existing storage systems like DAS, SAN, or NAS

7) Multitenant

- be able to consolidate and centralize Big Data projects
- have capability to leverage the fabric across multiple use cases

6) Network partitioning

- support separate Big Data infrastructure from other IT resources on the network
- support privacy and regulatory norms

5) Scale Out

- provide seamless transition as projects increase in size and number
- accommodate new traffic patterns and larger, more complex workloads

4) Converged/ unified fabric network

-  target a flatter and converged network with Big Data as an additional configurable workload
- provide virtual chassis architecture with provision to logically manage access to multiple switches as a single device

3) Network intelligence

- carry any-to-any traffic flows of a Big Data as well as traditional cluster over an Ethernet connection
-  manage single network fabric irrespective of data requirements or storage design

2) Enough bandwidth for data node network

- provision data nodes with enough bandwidth for efficient job completion
- do cost/benefit trade-off on increasing data node uplinks

1) Support bursty traffic

- support loading files into HDFS which triggers replication of data blocks or writing mapper output files and lead to higher network use in a short period of time causing bursts of traffic in the network.
- provide optimal buffering in network devices to absorb bursts

Read more »