Skip to main content

Leveraging Hadoop with in-memory computing

Continuing with the report on In-memory computing solutions, this post looks at 15 more companies which are leveraging Hadoop. A few notes on the methodology used:
  1. The companies listed below have been independently validated to be using 'In-memory computing' in their solutions.
  2. Hadoop usage may not come off as a direct use case in some of these solutions. Our focus is to investigate how Hadoop is complementing In-memory computing rather than being bete-noir of each other.
  3. Open source solution name Apache Gora is not listed here. To read more extensively on it, click here.
Listed below are 15 more IMC companies and their Hadoop connection tactics.
Company Product Hadoop integration tactic
16. PrismTech OpenSlice PrismTech’s OpenSplice DDS enables seamless, timely, scalable and dependable distributed data sharing from embedded and mobile Devices to the enterprise and the cloud. OpenSlice Gateway supports Apache Camel Connectors for Big Data Stores which aids in streaming data to/from Hbase.
17. QlikTech Qlikview The ODBC Connector for Qlikview enables enterprise users to access Hadoop data through the Business Intelligence Application Qlikview. The driver achieves this by translating Open Database Connectivity (ODBC) calls from Qlikview into SQL and passing the SQL queries to the underlying Impala or Hive engines.
18. Red Hat Red Hat Enterprise MRG Grid & JBoss Grid Red Hat Enterprise MRG Grid aims to provide job and resource scalability and job throughput and performance in heterogeneous environments. It provides a common interface for submission, monitoring and reporting of Hadoop MapReduce jobs. Further it aims to provide intelligence to match jobs with resources. It helps to dispatch jobs and data to resources while managing multiple users, groups and resources.
19. SAP SAP Real Time Data Platform  SAP Real-Time Data Platform now combines Hadoop with following SAP technologies like:
-          SAP HANA – in memory DB for faster access;
-          SAP Sybase IQ- for ‘federated’ queries by considering HDFS/Hive as external DB and joining results in Sybase;
-          SAP Data Services – for data interactions using HiveQL or Pig;
-          SAP Information Steward – for improving data quality;
-          SAP Sybase Event Stream Processor – real time analysis of events & data and pushing required data to HDFS;
-          SAP BusinessObjects, SAP BusinessObjects Explorer and SAP Crystal Reports – for reporting, visualization and building data models. 
20. SAS Institute SAS/ACCESS Interface to Hadoop SAS In-Memory Analytics is used by multiple solutions and capabilities from SAS product family (for instance, SAS Visual Analytics). For Hadoop integration, SAS/ACCESS retrieves big data stored in HDFS and allows using other capabilities, such as the Pig and Hive languages and the MapReduce framework. SAS programmers can submit MapReduce, scripting and HDFS commands from within Base SAS. SAS also supports external file references, allowing you to conveniently find and use Hadoop files from any SAS product.
21. ScaleOut Software ScaleOut hServer ScaleOut hServer provides an integrated in-memory data grid and computation engine that executes standard Hadoop MapReduce code in-parallel, in-memory through its own Hadoop MapReduce execution engine. Like Hadoop, ScaleOut hServer performs data-parallel computation in which application code is sent to every node in the grid. However, unlike standard Hadoop, which stores data on disk and moves it multiple times during processing, ScaleOut hServer's integrated in-memory data grid minimizes data motion by enabling input and output data sets, as well as intermediate data, to be stored in the IMDG. Customized, grid record readers and writers for IMDG data efficiently pipeline key/value pairs to the mappers and from the reducers.
22. Software AG Terracotta BigMemory-Hadoop Connector Terracotta's BigMemory-Hadoop Connector lets Hadoop jobs write data directly into BigMemory, Terracotta’s in-memory data management platform. This enables downstream applications to get instant access to Hadoop results by reading from BigMemory. Hadoop jobs also execute faster, as they can now write to memory instead of disk (HDFS).
23. SQLstream SQLstream Connector for Hadoop SQLstream is a 'NoDatabase' technology, no data are stored, instead data arrives as real-time streams and processed in-memory. The SQLstream Connector for Hadoop provides continuous integration with Hadoop HBase, enabling organizations to exploit the value of both real-time analytics over the arriving data, and to persist machine data and streaming intelligence in Hadoop HBase for further analysis.
24. Teradata Teradata Intelligent Memory Teradata Intelligent Memory is another best-of-breed solution that fits into the overall Unified Data Architecture strategy, which leverages Teradata, Teradata Aster, and open source Apache™ Hadoop. Data in Apache Hadoop that is frequently used can be accessed through Teradata SQL-H, and based on temperature of the data, moved to Intelligent Memory to take advantage of its high performance computing capability.
25. Tervela Tervela Turbo Tervela Turbo, a high-performance data movement engine, helps to implement mission-critical Hadoop systems with reliable data capture, high-speed data loading into HDFS, disaster recovery for Hadoop, and ETLT data warehousing. Tervela Turbo's no-loss data fabric technology captures data from virtually any source system and transports it to HDFS for loading, eliminating failures that stem from common file transfer and network errors. 
26. Tibco Software TIBCO ActiveSpaces TIBCO ActiveSpaces uses a connector approach to integrate Hadoop MapReduce, Pig and Hive. The first part integrates ActiveSpaces into the core MapReduce functionality, and provides an InputFormat and an OutputFormat for ActiveSpaces.  For data flows in Pig, the connector supplies a LoadFunc and a StoreFunc that allow full interoperability between Pig and ActiveSpaces. The connector supplies a Hive StorageHandler to write HiveQL.
27. Vitria Vitria Operational Intelligence (OI) Vitria OI's Hadoop connector allows the solution to work synergistically alongside Hadoop. While Hadoop delivers deep insights
into massive volumes of stored data, Vitria OI complements it by providing continuous, real-time insights into streaming data
together with the ability to take proactive, automated action. The Hadoop connector allows data streaming into Vitria OI to be
persisted in Hadoop, and queried to provide historical and baseline analysis. Data captured by Hadoop can be streamed into
Vitria OI, which provides instant analysis and timely detection of opportunities and threats. Data streamed from Hadoop can
also be used within the Vitria OI Apps, which empower business users to directly create value from Hadoop data.
28. VoltDB VoltDB Enterprise Edition VoltDB Enterprise Edition V1.3.3 contains a new export client that exports data from VoltDB into a Hadoop distributed filed system (hdfs) using the Apache Sqoop. The Export-to-Hadoop client operates in much the same way as the existing export-to-file client, but gives Export data access to the format flexibility that Sqoop offers. The VoltDB export client automatically manages periodic Sqoop jobs based on your configuration.
29. Workday Workday Big Data Analytics Workday Big Data Analytics incorporates Datameer’s Hadoop-driven analytics platform, which enables users to integrate, analyze, and visualize data of any type, size, or source. On the backend, the setup will initially be based on Amazon Web services, including S3, EC2, and the Elastic MapReduce. In the future the setup will be run on Cloudera's Hadoop distribution hosted in Workday's data center.
30. WSO2 WSO2 Complex Event Processor WSO2 CEP tightly Integrates with WSO2 Business Activity Monitor which supports recording and post processing of events with Map-Reduce via Apache Hadoop. Business Activity Monitor provides SQL-like flexibility for writing analysis algorithms via Apache Hive and extensibility via analysis algorithms implemented in Java. Results from analysis can be stored flexibly, including in Apache Cassandra, a relational database or a file system.

<< In-memory computing architecture with Hadoop


  1. Its good to see that many companies are using the in-memory computation feature. Can you write an article on how to implement in-memory computation. It will be helpful for us.

  2. The in-memory computation is future of computing, both cloud and in-memory computation is going to be really big.


Post a Comment

Popular articles

5 online tools in data visualization playground

While building up an analytics dashboard, one of the major decision points is regarding the type of charts and graphs that would provide better insight into the data. To avoid a lot of re-work later, it makes sense to try the various chart options during the requirement and design phase. It is probably a well known myth that existing tool options in any product can serve all the user requirements with just minor configuration changes. We all know and realize that code needs to be written to serve each customer’s individual needs. To that effect, here are 5 tools that could empower your technical and business teams to decide on visualization options during the requirement phase. Listed below are online tools for you to add data and use as playground. 1)      Many Eyes : Many Eyes is a data visualization experiment by IBM Research and the IBM Cognos software group. This tool provides option to upload data sets and create visualizations including Scatter Plot, Tree Ma

Data deduplication tactics with HDFS and MapReduce

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction. Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability. From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets. Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus o

In-memory data model with Apache Gora

Open source in-memory data model and persistence for big data framework Apache Gora™ version 0.3, was released in May 2013. The 0.3 release offers significant improvements and changes to a number of modules including a number of bug fixes. However, what may be of significant interest to the DynamoDB community will be the addition of a gora-dynamodb datastore for mapping and persisting objects to Amazon's DynamoDB . Additionally the release includes various improvements to the gora-core and gora-cassandra modules as well as a new Web Services API implementation which enables users to extend Gora to any cloud storage platform of their choice. This 2-part post provides commentary on all of the above and a whole lot more, expanding to cover where Gora fits in within the NoSQL and Big Data space, the development challenges and features which have been baked into Gora 0.3 and finally what we have on the road map for the 0.4 development drive. Introducing Apache Gora Although