Skip to main content

Leveraging Hadoop with in-memory computing

Continuing with the report on In-memory computing solutions, this post looks at 15 more companies which are leveraging Hadoop. A few notes on the methodology used:
  1. The companies listed below have been independently validated to be using 'In-memory computing' in their solutions.
  2. Hadoop usage may not come off as a direct use case in some of these solutions. Our focus is to investigate how Hadoop is complementing In-memory computing rather than being bete-noir of each other.
  3. Open source solution name Apache Gora is not listed here. To read more extensively on it, click here.
Listed below are 15 more IMC companies and their Hadoop connection tactics.
Company Product Hadoop integration tactic
16. PrismTech OpenSlice PrismTech’s OpenSplice DDS enables seamless, timely, scalable and dependable distributed data sharing from embedded and mobile Devices to the enterprise and the cloud. OpenSlice Gateway supports Apache Camel Connectors for Big Data Stores which aids in streaming data to/from Hbase.
17. QlikTech Qlikview The ODBC Connector for Qlikview enables enterprise users to access Hadoop data through the Business Intelligence Application Qlikview. The driver achieves this by translating Open Database Connectivity (ODBC) calls from Qlikview into SQL and passing the SQL queries to the underlying Impala or Hive engines.
18. Red Hat Red Hat Enterprise MRG Grid & JBoss Grid Red Hat Enterprise MRG Grid aims to provide job and resource scalability and job throughput and performance in heterogeneous environments. It provides a common interface for submission, monitoring and reporting of Hadoop MapReduce jobs. Further it aims to provide intelligence to match jobs with resources. It helps to dispatch jobs and data to resources while managing multiple users, groups and resources.
19. SAP SAP Real Time Data Platform  SAP Real-Time Data Platform now combines Hadoop with following SAP technologies like:
-          SAP HANA – in memory DB for faster access;
-          SAP Sybase IQ- for ‘federated’ queries by considering HDFS/Hive as external DB and joining results in Sybase;
-          SAP Data Services – for data interactions using HiveQL or Pig;
-          SAP Information Steward – for improving data quality;
-          SAP Sybase Event Stream Processor – real time analysis of events & data and pushing required data to HDFS;
-          SAP BusinessObjects, SAP BusinessObjects Explorer and SAP Crystal Reports – for reporting, visualization and building data models. 
20. SAS Institute SAS/ACCESS Interface to Hadoop SAS In-Memory Analytics is used by multiple solutions and capabilities from SAS product family (for instance, SAS Visual Analytics). For Hadoop integration, SAS/ACCESS retrieves big data stored in HDFS and allows using other capabilities, such as the Pig and Hive languages and the MapReduce framework. SAS programmers can submit MapReduce, scripting and HDFS commands from within Base SAS. SAS also supports external file references, allowing you to conveniently find and use Hadoop files from any SAS product.
21. ScaleOut Software ScaleOut hServer ScaleOut hServer provides an integrated in-memory data grid and computation engine that executes standard Hadoop MapReduce code in-parallel, in-memory through its own Hadoop MapReduce execution engine. Like Hadoop, ScaleOut hServer performs data-parallel computation in which application code is sent to every node in the grid. However, unlike standard Hadoop, which stores data on disk and moves it multiple times during processing, ScaleOut hServer's integrated in-memory data grid minimizes data motion by enabling input and output data sets, as well as intermediate data, to be stored in the IMDG. Customized, grid record readers and writers for IMDG data efficiently pipeline key/value pairs to the mappers and from the reducers.
22. Software AG Terracotta BigMemory-Hadoop Connector Terracotta's BigMemory-Hadoop Connector lets Hadoop jobs write data directly into BigMemory, Terracotta’s in-memory data management platform. This enables downstream applications to get instant access to Hadoop results by reading from BigMemory. Hadoop jobs also execute faster, as they can now write to memory instead of disk (HDFS).
23. SQLstream SQLstream Connector for Hadoop SQLstream is a 'NoDatabase' technology, no data are stored, instead data arrives as real-time streams and processed in-memory. The SQLstream Connector for Hadoop provides continuous integration with Hadoop HBase, enabling organizations to exploit the value of both real-time analytics over the arriving data, and to persist machine data and streaming intelligence in Hadoop HBase for further analysis.
24. Teradata Teradata Intelligent Memory Teradata Intelligent Memory is another best-of-breed solution that fits into the overall Unified Data Architecture strategy, which leverages Teradata, Teradata Aster, and open source Apache™ Hadoop. Data in Apache Hadoop that is frequently used can be accessed through Teradata SQL-H, and based on temperature of the data, moved to Intelligent Memory to take advantage of its high performance computing capability.
25. Tervela Tervela Turbo Tervela Turbo, a high-performance data movement engine, helps to implement mission-critical Hadoop systems with reliable data capture, high-speed data loading into HDFS, disaster recovery for Hadoop, and ETLT data warehousing. Tervela Turbo's no-loss data fabric technology captures data from virtually any source system and transports it to HDFS for loading, eliminating failures that stem from common file transfer and network errors. 
26. Tibco Software TIBCO ActiveSpaces TIBCO ActiveSpaces uses a connector approach to integrate Hadoop MapReduce, Pig and Hive. The first part integrates ActiveSpaces into the core MapReduce functionality, and provides an InputFormat and an OutputFormat for ActiveSpaces.  For data flows in Pig, the connector supplies a LoadFunc and a StoreFunc that allow full interoperability between Pig and ActiveSpaces. The connector supplies a Hive StorageHandler to write HiveQL.
27. Vitria Vitria Operational Intelligence (OI) Vitria OI's Hadoop connector allows the solution to work synergistically alongside Hadoop. While Hadoop delivers deep insights
into massive volumes of stored data, Vitria OI complements it by providing continuous, real-time insights into streaming data
together with the ability to take proactive, automated action. The Hadoop connector allows data streaming into Vitria OI to be
persisted in Hadoop, and queried to provide historical and baseline analysis. Data captured by Hadoop can be streamed into
Vitria OI, which provides instant analysis and timely detection of opportunities and threats. Data streamed from Hadoop can
also be used within the Vitria OI Apps, which empower business users to directly create value from Hadoop data.
28. VoltDB VoltDB Enterprise Edition VoltDB Enterprise Edition V1.3.3 contains a new export client that exports data from VoltDB into a Hadoop distributed filed system (hdfs) using the Apache Sqoop. The Export-to-Hadoop client operates in much the same way as the existing export-to-file client, but gives Export data access to the format flexibility that Sqoop offers. The VoltDB export client automatically manages periodic Sqoop jobs based on your configuration.
29. Workday Workday Big Data Analytics Workday Big Data Analytics incorporates Datameer’s Hadoop-driven analytics platform, which enables users to integrate, analyze, and visualize data of any type, size, or source. On the backend, the setup will initially be based on Amazon Web services, including S3, EC2, and the Elastic MapReduce. In the future the setup will be run on Cloudera's Hadoop distribution hosted in Workday's data center.
30. WSO2 WSO2 Complex Event Processor WSO2 CEP tightly Integrates with WSO2 Business Activity Monitor which supports recording and post processing of events with Map-Reduce via Apache Hadoop. Business Activity Monitor provides SQL-like flexibility for writing analysis algorithms via Apache Hive and extensibility via analysis algorithms implemented in Java. Results from analysis can be stored flexibly, including in Apache Cassandra, a relational database or a file system.

<< In-memory computing architecture with Hadoop

Comments

  1. Its good to see that many companies are using the in-memory computation feature. Can you write an article on how to implement in-memory computation. It will be helpful for us.

    ReplyDelete
  2. The in-memory computation is future of computing, both cloud and in-memory computation is going to be really big.

    ReplyDelete

Post a Comment

Popular posts from this blog

Hadoop's 10 in LinkedIn's 10

LinkedIn, the pioneering professional social network has turned 10 years old. One of the hallmarks of its journey has been its technical accomplishments and significant contribution to open source, particularly in the last few years. Hadoop occupies a central place in its technical environment powering some of the most used features of desktop and mobile app. As LinkedIn enters the second decade of its existence, here is a look at 10 major projects and products powered by Hadoop in its data ecosystem.
1)      Voldemort:Arguably, the most famous export of LinkedIn engineering, Voldemort is a distributed key-value storage system. Named after an antagonist in Harry Potter series and influenced by Amazon’s Dynamo DB, the wizardry in this database extends to its self healing features. Available in HA configuration, its layered, pluggable architecture implementations are being used for both read and read-write use cases.
2)      Azkaban:A batch job scheduling system with a friendly UI, Azkab…

Data deduplication tactics with HDFS and MapReduce

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction.
Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability.
From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets.
Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus on how MapReduce and…

Top Big Data Influencers of 2015

2015 was an exciting year for big data and hadoop ecosystem. We saw hadoop becoming an essential part of data management strategy of almost all major enterprise organizations. There is cut throat competition among IT vendors now to help realize the vision of data hub, data lake and data warehouse with Hadoop and Spark.
As part of its annual assessment of big data and hadoop ecosystem, HadoopSphere publishes a list of top big data influencers each year. The list is derived based on a scientific methodology which involves assessing various parameters in each category of influencers. HadoopSphere Top Big Data Influencers list reflects the people, products, organizations and portals that exercised the most influence on big data and ecosystem in a particular year. The influencers have been listed in the following categories:

AnalystsSocial MediaOnline MediaProductsTechiesCoachThought LeadersClick here to read the methodology used.

Analysts:Doug HenschenIt might have been hard to miss Doug…