Skip to main content

Reviewing the Splunk run

Splunk, better know for machine log data analysis software hosted its annual conference Splunk .conf2013 recently in Las Vegas. With the buzz in the air, it is probably well in time to assess Splunk’s capabilities in Hadoop area.

Splunk’s Hadoop offerings

Splunk currently offers the following tools for Hadoop integration:
- Hunk: Splunk Analytics for Hadoop Beta
The latest offering from Spunk stable which aims to explore, analyze and visualize data in Hadoop.

- Splunk Hadoop Connect
Enables bi-directional integration for movement of data between Splunk and Hadoop. Forwards events from Splunk Enterprise to Hadoop and also import data already stored in Hadoop.

- Hadoop Management
The Splunk App for HadoopOps enables real-time monitoring and analysis of the health and performance of the complete Hadoop environment, including Hadoop, the network,
switch, rack, operating system and database.

- Shuttl
An archiving application for Splunk which can export out data in either Native indexed Splunk Bucket format or csv format called Spunk Interchange format. It supports Attached storage, HDFS, S3 and S3n,Amazon Glacier as back-end storage systems.


With Hunk, the company known for its user friendly tools has tried to build a credible competitor to Cloudera Impala, IBM Big SQL, Hortonworks Tez and rest in the analytics on Hadoop space. The beta release signifies it is considering Hadoop as an integral part of its Big Data and multi structured data management strategy. Earlier limited to considering Hadoop as an archive repository for Splunk Enterprise, we can now look forward to solutions involving active usage of HDFS data in analysis.
 
Hunk: Splunk Analytics for Hadoop Beta
(image source: splunk.com)

The competitive landscape

To its credit, Splunk has been expanding customer base aggressively and boasts of a CAGR of 30% plus. As of January 31, 2013, Splunk Enterprise had more than 5200 customers in more than 90 countries, which includes more than half of the Fortune 100 companies. The stock price has been zooming and has reached a life time high of 63.34$ from around 36$ when it first listed in April 2012.

While revenues have increased, the company however in its effort to boost sales, R&D and acquire a company has been bearing loss on its balance sheet. As the company will reach an optimal revenue range plateau, our assessment is that it is likely to face increasing cost pressure to cut down on the losses. This may limit its ability to innovate, sell and/or acquire. The rumor of Splunk’s acquisition has been around the street for quite sometime and considering the financial indicators, it is possible it may merge or get acquired which may eventually help to optimize on its sales, marketing and operating expenditure.

Splunk is upbeat about its cloud offerings and SDK released for developers and integrators to hook on to its solutions. However, at the same time, Splunk is proprietary and does not come across as a big proponent of open source software. Further, Splunk Enterprise is known to be relatively expensive and by its own admission “Our pricing method may ultimately result in a higher total cost to users as data volumes increase over time”.  It will continue to face competitive pressure from multiple sources. Many internet companies and startups have come up with tools and utilities to analyze logs by leveraging Hadoop. Not withstanding its partnership with Hadoop focused companies like Cloudera, Hortonworks, MapR and Pentaho, there are other viable Hadoop based alternatives for Splunk’s functions now. Security remains a strong point in Splunk’s offering but faces direct competitive heat from large IT vendors. Web analytics and Business Intelligence suites will continue to challenge its market positioning. Overall, with its current customer base and reach, we expect Splunk to continue attracting market traction further but more towards the enterprise end of the horizon and lesser towards the internet and e-commerce companies.

Comments

Popular posts from this blog

In-memory data model with Apache Gora

Open source in-memory data model and persistence for big data framework Apache Gora™ version 0.3, was released in May 2013. The 0.3 release offers significant improvements and changes to a number of modules including a number of bug fixes. However, what may be of significant interest to the DynamoDB community will be the addition of a gora-dynamodb datastore for mapping and persisting objects to Amazon's DynamoDB. Additionally the release includes various improvements to the gora-core and gora-cassandra modules as well as a new Web Services API implementation which enables users to extend Gora to any cloud storage platform of their choice. This 2-part post provides commentary on all of the above and a whole lot more, expanding to cover where Gora fits in within the NoSQL and Big Data space, the development challenges and features which have been baked into Gora 0.3 and finally what we have on the road map for the 0.4 development drive.
Introducing Apache Gora Although there are var…

Amazon DynamoDB datastore for Gora

What was initially suggested during causal conversation at ApacheCon2011 in November 2011 as a “neat idea”, would soon become prime ground for Gora's first taste of participation within Google's Summer of Code program. Initially, the project, titled Amazon DynamoDB datastore for Gora, merely aimed to extend the Gora framework to Amazon DynamoDB. However, it seem became obvious that the issue would include much more than that simple vision.

The Gora 0.3 Toolbox We briefly digress to discuss some other noticeable additions to Gora in 0.3, namely: Modification of the Query interface: The Query interface was amended from Query<K, T> to Query<K, T extends Persistent> to be more precise and explicit for developers. Consequently all implementors and users of the Query interface can only pass object's of Persistent type. Logging improvements for data store mappings: A key aspect of using Gora well is the establishment and accurate definitio…

Data deduplication tactics with HDFS and MapReduce

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction.
Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability.
From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets.
Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus on how MapReduce and…