With some major announcements done recently on
partnerships between SAP HANA and Hadoop distributions, there is now a formal
channel for Hadoop to make inroads in SAP shops. The 'SAP shop' is an informal lingo
term used for captive enterprise organizations which utilize SAP for ERP, SCM,
CRM, HR and other enterprise needs. These announcements are probably one of the biggest enterprise push to Hadoop this year besides EMC’s Hawq and IBM’s PureData Systems blitz in the earlier quarters.
Symbiotic relationship
Within the startup and internet companies world, Hadoop
has been used with in-memory technologies in a wide variety of integrated ways
which include Redis and Memcached. Certain organizations have used variant
architecture where HDFS file system has been used for data storage, Redis for
data cache and database like MongoDB for
key-value store. However, Redis has still some way to go for enterprise
adoption besides current constraints on scalability. Similarly, Memcached has
had its own limitations like data loss culpability on node/process restart. As
a symbiotic mutual exchange reward, it is probably this lacuna that SAP HANA
aims to exploit within the ardent followers of Hadoop in internet
companies. Although not open source or
free, SAP HANA has also alongside moved beyond license based usage to cloud based
billing and probably can be a bit more appealing to the cost conscious startup
world.
Enterprise Architecture decisions
Beyond the initial hype and confusion which
concerned the replacement of RDBMS with Hadoop or replacement of Hadoop with
SAP HANA, there is much larger sense of consensus and clarity within enterprise
adopters on co-existence of RDBMS, NoSQL DB, Hadoop and SAP HANA. It is now
largely accepted that pre-defined schema data fits better in DB which may be relational,
NoSQL or in-memory. On the other hand, if the data is largely unstructured and
it is better left to programmer or analyst to explore the data, Hadoop may be a
better choice. Similarly, data in PetaByte or ExaByte range makes a natural
fitment to Hadoop while data less than 5TB would better fit in a DB. One would need to keep in mind that the 5TB range
includes metadata, temporary table and buffer size.
Price point for lower data volumes makes an appealing
cost case for Hadoop while in the higher volume range, you may have to take a prudent
call on either of the technologies. Skills is not as much of an issue any
longer since most organizations prefer to train their resources and with a
period of time, there will be a decent number of folks on the winning products
among these technologies. Similarly, within the enterprise architecture, if the
organization is using multiple vendor systems for SCM, ERP, payroll, campaign
management etc, there is more than a likely chance, the organization would like
to consolidate its data in a central warehouse or a Hadoop driven repository.
SAP offering
SAP Real-Time DataPlatform now combines Hadoop with following SAP technologies like:
-
SAP
HANA – in memory DB for faster access;
-
SAP
Sybase IQ- for ‘federated’ queries by considering HDFS/Hive as external DB and
joining results in Sybase;
-
SAP
Data Services – for data interactions using HiveQL or Pig;
-
SAP
Information Steward – for improving data quality;
-
SAP
Sybase Event Stream Processor – real time analysis of events & data and
pushing required data to HDFS;
-
SAP
BusinessObjects, SAP BusinessObjects Explorer and SAP Crystal Reports – for
reporting, visualization and building data models.
We will keep a watch on
enterprise success stories but for now, the merging of both worlds seems to
sound like a win-win at both ends.
Comments
Post a Comment