Introducing Hadoop FlipBooks

In line with the learning theme that HadoopSphere has been evangelizing, we are pleased to introduce a new feature named FlipBooks. A Hadoop flipbook is a quick reference guide for any topic giving a short summary of key concepts in form of Q&A. Typically with a set of 4 questions, it tries to test your knowledge on the concept.

To begin with 4 flipbooks have been posted:
  • Apache Spark
  • Hadoop Security
  • HDFS
  • MapReduce
Check back each week for more flipbooks.

To access flipbooks, you may proceed to

Read more »

Big Data and Hadoop Training

Hadoopsphere provides Apache Hadoop training that software engineers, developers and analysts need to understand and implement Big Data technology and tools.

Big Data and Apache Hadoop Level 1 Course:

With HadoopSphere, you can start Apache Hadoop learning in a 4-day hands-on training course. This course teaches students how to develop applications and analyze Big Data stored in Hadoop Distributed file system using custom MapReduce progams and tools like Pig and Hive. Students will perform hands-on sessions on multiple use cases from real life. Other topics covered include data ingestion using Sqoop and Flume, and using NoSQL database HBase.

Key Features:

- 4 Day Class room Training - Comprehensive Course content
- Real Life project use case - Extensive Hands-On sessions
- Expert Trainer - Practice Tests included

Course Curriculum:

Lesson 01 - Introduction to Big Data and Hadoop
Lesson 02 - Hadoop Architecture
Lesson 03 - Hadoop Deployment
Lesson 04 - HDFS
Lesson 05 - Introduction to MapReduce
Lesson 06 - Advanced HDFS and Map Reduce
Lesson 07 - Pig
Lesson 08 - Hive
Lesson 09 - Sqoop, Flume
Lesson 10 - HBase
Lesson 11 - Zookeeper
Lesson 12 - Ecosystem and its Components


Our expert faculty has trained professionals from:
- Amdocs
- Aircom
- American Express
- Aon Hewitt
- Bain & Company
- Cognizant
- Oracle
- Samsung
- Tata Consultancy Services
- Wipro
Average Rating: 4.5 out of 5

Enroll Now:

Training classes currently scheduled in following countries. Send us an e-mail at scale[at] or contact us using this link.
- USA/Canada
- UK
- Switzerland
- India
- Singapore

Request a call back:

You may also contact us for any queries or if you would like to discuss a custom, on-site training course.

Read more »

10 parameters for Big Data networks

Big Data and Hadoop clusters involve heavy volume of data and in many instances high velocity in bursty traffic patterns. With these clusters finding in-roads in enterprise data centers, the network designers have a few more requirements to take care. Listed below are 10 parameters to evaluate while designing a network for Big Data and Hadoop cluster.

10) Available and resilient

- Allows network designs with multiple redundant paths between the data nodes than having one or two points of failure.
- Supports upgrades without any disruption to the data nodes

9) Predictable

- Right sizing the network configuration (1GbE/10GbE/100GbE switch capacity) to achieve predictable latency in network
- real time latency may not be required for batch processing

8) Holistic network

- one network can support all workloads : Hadoop, NoSQL, Warehouse, ETL, Web
- support Hadoop and existing storage systems like DAS, SAN, or NAS

7) Multitenant

- be able to consolidate and centralize Big Data projects
- have capability to leverage the fabric across multiple use cases

6) Network partitioning

- support separate Big Data infrastructure from other IT resources on the network
- support privacy and regulatory norms

5) Scale Out

- provide seamless transition as projects increase in size and number
- accommodate new traffic patterns and larger, more complex workloads

4) Converged/ unified fabric network

-  target a flatter and converged network with Big Data as an additional configurable workload
- provide virtual chassis architecture with provision to logically manage access to multiple switches as a single device

3) Network intelligence

- carry any-to-any traffic flows of a Big Data as well as traditional cluster over an Ethernet connection
-  manage single network fabric irrespective of data requirements or storage design

2) Enough bandwidth for data node network

- provision data nodes with enough bandwidth for efficient job completion
- do cost/benefit trade-off on increasing data node uplinks

1) Support bursty traffic

- support loading files into HDFS which triggers replication of data blocks or writing mapper output files and lead to higher network use in a short period of time causing bursts of traffic in the network.
- provide optimal buffering in network devices to absorb bursts

Read more »

Wearable Hadoop technology

While Hadoop and MapReduce have been harping on distributed parallel processing on community hardware for long, some Hadoop enthusiasts have taken this too far. Enter Datasayer from Edward J Yoon who has built wearable Hadoop technology. This means every time you walk, jump, blink your eyes or move your hand, you would be using your kinetic energy to run a MapReduce job.

Read below to understand this breathtaking innovation.

  • Hadoop Glass– Inspired by Google glass, this forms the client layer of your Hadoop cluster. Using the eye wear interface, you can trigger a MapReduce job or fire a Pig, Hive or Hama query.
  • Hadoop Watch– Inspired by Samsung Gear (Watch) technology, this forms the name node of your Hadoop cluster and stores the meta data for the data stored in data nodes. Further, the job tracker is also hosted on the watch itself and using MRv1, controls the execution of tasks on different nodes.

  • Hadoop Shoes– This forms the data nodes layer of the cluster. By default the replication factor is 2 for each block resident in data node shoe. Each time you walk or jump, the kinetic energy is converted to CPU cycles and powers the tasks running via tasktracker on each shoe.

  • Hadoop Kinect- Inspired by Microsoft’s Kinect technology, you can configure the shoes a.k.a data nodes of another person in vicinity of 100 meters. The architecture also leverages advanced wireless technology to communicate between nodes. From a scalability perspective, all you need to do is have more people jumping or walking with data node shoes in the vicinity.

If all this sounds too good to be true, well, you guessed it right. This is an All Fools Day prank by Datasayer. 

Follow @datasayer

Read more »