As the enterprise projects for Hadoop have started picking up steam, we need to come up with an effective delivery methodology for Hadoop and machine learning projects. It is evident to any industry player that the traditional waterfall methodology won’t work for such projects. Also, a typical hackathon development attitude from a startup culture may not fit in the CMMi criteria of multi million enterprise bids.
Hadoopsphere tries to present a draft delivery methodology for Hadoop development projects. The draft still needs to go through a few reviews and comments are welcome to make this a robust methodology.
Some notes on this delivery methodology:
1) Agile delivery techniques are proposed – which may include Scrum or Kanban based delivery techniques. While Scrum may rely on time bound iterations/sprints, Kanban relies on exploratory task/event based assignment and may not be time bound. Kanban is well suited for initial product or application development while Scrum may be recommended for subsequent releases.
2) The methodology has been tried to accustom to both product and services project development – further inputs are welcome.1
3) The methodology has been tailored with Big Data Analytics Process proposed in Gartner report. 2
4) The artifacts which play a key role in Hadoop projects include but are not limited to Hardware & Software Plan along with Data Management Plan. The other proposed artifacts are mentioned in figure above.
 Key inputs from Sachin Ghai.
 Use Kanban to Manage Big DataDevelopment, Nathan Wilson, Gartner, August 2012