Fraud analysis has
been one of the oft quoted use cases for Hadoop. We look at the topic further to explore usage of Hadoop ecosystem products.
Per se, the fraud
analytics can be divided into 3 further use cases:
1- Fraud detection: determining if a fraud is taking place or has occurred in the
past and generating appropriate alert for it.
2- Fraud prevention: implementing controls and access to prevent fraud.
3- Fraud reduction: monitoring and predicting patterns to minimize chances of fraud occurrence
Listed below are some of
the methods that can be implemented using Hadoop to ensure fulfillment of
either of the 3 use cases above.
1- Deduplication -
a) Entity matching - This could include exact or similar matching of entities like
name, father name or contact information (phone, e-mail id, street, city) or
phonetic matches using the deduplication methods. Since this is a data
intensive exercise and requires matching previously built index, there cannot
be better technology fit than Hadoop.
b) Social network identity matching - Not very commonly used, but emerging off late, is a tendency to
match social network profiles with customer identity. While this technique
could be quite effective provided you have the right social network data feeds,
please be aware of privacy laws that may be applicable.
2- Outlier detection
-
A usual outlier will be a deviation
from a common usage pattern of a customer or transaction set. Using custom
machine learning algorithms or available libraries, we would tend to combine
data to see any outlier points. Clustering, probabilistic distributions along
with visualization techniques are more common methods to derive outliers.
These may be used in
conjunction with techniques like path analysis, sessionization, tokenization
and attribution. Regression, co-relation, averages and graph analysis may also
be employed based on functional requirement.
3- Workflow -
Transaction streaming, monitoring,
alert forwarding, alert disposal and transaction blocking could be among a few
steps that a custom workflow may implement in fraud management system.
Considering the massive volume of transactions, a custom DSL workflow may be
implemented on top of Hadoop.
Some of the key advantages,
that we see with Hadoop usage in fraud management systems include, but not
limited to:
1. Quick loading of
data with tools like Flume
2. No need of
defined schema and instead using custom scripts/ programs to explore data
3. Reducing need for
Data warehouse to use raw multi structured data as-is
4. Faster processing
of data which reduces fraud detection time frame
5. Elimination of DB
overheads like index, backups
Further
implementation evidence is needed to see if a Rule Engine can also be built on
top of a DSL framework. Overall, we expect a hybrid architecture involving engine, streams, workflow, dashboard, portal and Hadoop based analytics in a comprehensive Fraud management system. Implementations will vary based on current architecture in the organization and tool set preference.
----------------------------------------------------------------------------
top image: wolf in sheep clothing; source: freedigitalphotos.net
Comments
Post a Comment