Distributed Denial of Service (DDoS) are one of the common
attempts in security hacking for making computation resources unavailable or to impair geographical networks. To analyze such attack patterns in network usage,
Hadoop and Map Reduce can step in. While Map Reduce detects packet traffic anomalies, the scalable Hadoop architecture offers solutions for processing data in a
reasonable response time.
In a paper published by Y.Lee and Y.Lee, Detecting DDoSAttacks with Hadoop, ACM CoNEXT Student Workshop, 2011, the authors present
Map Reduce based algorithms which can be implemented on packet analysis while
leveraging Hadoop for parallel processing.
There are two distinct algorithms that have been proposed:
-
Counter based method: This
method relies on three key parameters: time interval which is the
duration during which packets are to be analyzed, threshold which
indicates frequency of requests and unbalance ratio which denotes the
anomaly ratio of response per page requested between specific client and
server.
“The masked timestamp with time interval is usd for
counting the number of requests from a specific client to the specific URL
within the same time duration. The reduce function summarizes the number of URL
requests, page requests, and server responses between a client and a server.
Finally, the algorithm aggregates values per server”
When the threshold is crossed and the unbalance
ratio is higher than normal, the clients are marked as attackers.
The key advantage of utilizing this algorithm is
obviously the low complexity as we would agree. The authors also indicate that
threshold value determination could be a key deciding factor in the
implementation without offering any further information on how to determinate
the value. Based on our knowledge from other Hadoop implementations, we know
that the same packet traffic data could be a rich mine for extracting the
threshold value and unbalance ratio. We have seen in other implementations how
Hadoop can be effectively utilized to analyze logs and arrive at statistical
trends and patterns.
-
Access pattern based method: Talking
of patterns, the authors move on the next algorithm for determining the attack.
Here they rely on a pattern which differentiates the normal traffic from a DDoS
traffic.
“This method
requires more than two MapReduce jobs:
the first job obtains
access sequence to the web page between a client and a web server and
calculates the spending time and the bytes count for each request
of the URL;
the second job hunts
out infected hosts by comparing the access sequence and the spending
time among clients trying to access the same server.”
What this essentially implies is that if two
clients are having a same DDoS bot they could be trying to use the same access
sequence (access resource A Ã B Ã C Ã …Z) and
have a very high likelihood of spending same amount of time and exchange same
amount of data while accessing A or B or Z. This indicates suspicious behavior
and indicates bot behavior rather than normal human interaction behavior.
Remember, the analysis here is on HTTP GET requests which are made more for
human interaction rather than bot interactions.
Obviously
such computation is expensive and systems today have some way to go to do this
big data analysis in reasonable time. The other challenge with both these
methods is that Hadoop still has a lot of orientation towards batch processing.
Many systems are trying to come to near real time mimic for Hadoop. In case
you have a story to share for the real time Hadoop processing, drop a comment
over here or on the More section of this site.
The
bigger challenge that the authors have tried to address over here is the
scalability issue. By leveraging Hadoop for parallel data processing utilizing
the inherent architecture of master- slave nodes, they have cut down on
processing time while being able to deal with more and more volumes of incoming
data.
Last year I worked on designing and developing the real-time abnormal network traffic system for billing and preventing abnormal usage of networking resources at Cloud center, Korea Telecom. Apache Hama was used. I can't share details but here's few slides.
ReplyDeletehttp://www.slideshare.net/udanax/monitoring-and-mining-network-traffic-in-clouds
Thank you Edward for sharing your implementation experience. Your slide deck on slideshare points to an interesting analytics methodology. We would love to see more details.
DeleteRespected author,
ReplyDeleteI am a College student doing a project on implementation of this paper. I am yet unclear about the dataset for input. Please Help. How can I contact you regarding this over mail?
Thanks for your interest. You may refer the contact coordinates mentioned in the paper.
ReplyDelete