Skip to main content

Reviewing the Splunk run

Splunk, better know for machine log data analysis software hosted its annual conference Splunk .conf2013 recently in Las Vegas. With the buzz in the air, it is probably well in time to assess Splunk’s capabilities in Hadoop area.

Splunk’s Hadoop offerings

Splunk currently offers the following tools for Hadoop integration:
- Hunk: Splunk Analytics for Hadoop Beta
The latest offering from Spunk stable which aims to explore, analyze and visualize data in Hadoop.

- Splunk Hadoop Connect
Enables bi-directional integration for movement of data between Splunk and Hadoop. Forwards events from Splunk Enterprise to Hadoop and also import data already stored in Hadoop.

- Hadoop Management
The Splunk App for HadoopOps enables real-time monitoring and analysis of the health and performance of the complete Hadoop environment, including Hadoop, the network,
switch, rack, operating system and database.

- Shuttl
An archiving application for Splunk which can export out data in either Native indexed Splunk Bucket format or csv format called Spunk Interchange format. It supports Attached storage, HDFS, S3 and S3n,Amazon Glacier as back-end storage systems.


With Hunk, the company known for its user friendly tools has tried to build a credible competitor to Cloudera Impala, IBM Big SQL, Hortonworks Tez and rest in the analytics on Hadoop space. The beta release signifies it is considering Hadoop as an integral part of its Big Data and multi structured data management strategy. Earlier limited to considering Hadoop as an archive repository for Splunk Enterprise, we can now look forward to solutions involving active usage of HDFS data in analysis.
 
Hunk: Splunk Analytics for Hadoop Beta
(image source: splunk.com)

The competitive landscape

To its credit, Splunk has been expanding customer base aggressively and boasts of a CAGR of 30% plus. As of January 31, 2013, Splunk Enterprise had more than 5200 customers in more than 90 countries, which includes more than half of the Fortune 100 companies. The stock price has been zooming and has reached a life time high of 63.34$ from around 36$ when it first listed in April 2012.

While revenues have increased, the company however in its effort to boost sales, R&D and acquire a company has been bearing loss on its balance sheet. As the company will reach an optimal revenue range plateau, our assessment is that it is likely to face increasing cost pressure to cut down on the losses. This may limit its ability to innovate, sell and/or acquire. The rumor of Splunk’s acquisition has been around the street for quite sometime and considering the financial indicators, it is possible it may merge or get acquired which may eventually help to optimize on its sales, marketing and operating expenditure.

Splunk is upbeat about its cloud offerings and SDK released for developers and integrators to hook on to its solutions. However, at the same time, Splunk is proprietary and does not come across as a big proponent of open source software. Further, Splunk Enterprise is known to be relatively expensive and by its own admission “Our pricing method may ultimately result in a higher total cost to users as data volumes increase over time”.  It will continue to face competitive pressure from multiple sources. Many internet companies and startups have come up with tools and utilities to analyze logs by leveraging Hadoop. Not withstanding its partnership with Hadoop focused companies like Cloudera, Hortonworks, MapR and Pentaho, there are other viable Hadoop based alternatives for Splunk’s functions now. Security remains a strong point in Splunk’s offering but faces direct competitive heat from large IT vendors. Web analytics and Business Intelligence suites will continue to challenge its market positioning. Overall, with its current customer base and reach, we expect Splunk to continue attracting market traction further but more towards the enterprise end of the horizon and lesser towards the internet and e-commerce companies.

Comments

Popular posts from this blog

Hadoop's 10 in LinkedIn's 10

LinkedIn, the pioneering professional social network has turned 10 years old. One of the hallmarks of its journey has been its technical accomplishments and significant contribution to open source, particularly in the last few years. Hadoop occupies a central place in its technical environment powering some of the most used features of desktop and mobile app. As LinkedIn enters the second decade of its existence, here is a look at 10 major projects and products powered by Hadoop in its data ecosystem.
1)      Voldemort:Arguably, the most famous export of LinkedIn engineering, Voldemort is a distributed key-value storage system. Named after an antagonist in Harry Potter series and influenced by Amazon’s Dynamo DB, the wizardry in this database extends to its self healing features. Available in HA configuration, its layered, pluggable architecture implementations are being used for both read and read-write use cases.
2)      Azkaban:A batch job scheduling system with a friendly UI, Azkab…

Data deduplication tactics with HDFS and MapReduce

As the amount of data continues to grow exponentially, there has been increased focus on stored data reduction methods. Data compression, single instance store and data deduplication are among the common techniques employed for stored data reduction.
Deduplication often refers to elimination of redundant subfiles (also known as chunks, blocks, or extents). Unlike compression, data is not changed and eliminates storage capacity for identical data. Data deduplication offers significant advantage in terms of reduction in storage, network bandwidth and promises increased scalability.
From a simplistic use case perspective, we can see application in removing duplicates in Call Detail Record (CDR) for a Telecom carrier. Similarly, we may apply the technique to optimize on network traffic carrying the same data packets.
Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus on how MapReduce and…

Top Big Data Influencers of 2015

2015 was an exciting year for big data and hadoop ecosystem. We saw hadoop becoming an essential part of data management strategy of almost all major enterprise organizations. There is cut throat competition among IT vendors now to help realize the vision of data hub, data lake and data warehouse with Hadoop and Spark.
As part of its annual assessment of big data and hadoop ecosystem, HadoopSphere publishes a list of top big data influencers each year. The list is derived based on a scientific methodology which involves assessing various parameters in each category of influencers. HadoopSphere Top Big Data Influencers list reflects the people, products, organizations and portals that exercised the most influence on big data and ecosystem in a particular year. The influencers have been listed in the following categories:

AnalystsSocial MediaOnline MediaProductsTechiesCoachThought LeadersClick here to read the methodology used.

Analysts:Doug HenschenIt might have been hard to miss Doug…