For purpose
of understanding matrix factorization, we tend to simplify the computation to:
p x r Matrix B
r x q Matrix C
for
achieving:
A ~= BC
and minimize L(A,B,C) computed over N training points
where, A is
a p x q input matrix and L is a loss function.
Among the
various algorithmic techniques available, the following are more popular:

Alternating Least Squares (ALS): as popularized in Netflix contest
Step 1: Initialize matrix C by assigning
the average rating for that movie as
the first row, and small random numbers for the
remaining entries.
Step 2: Fix C, Solve B by minimizing
the objective function (the sum of
squared errors);
Step 3: Fix B, solve C by minimizing the
objective function similarly;
Step 4: Repeat Steps 2
and 3 until a stopping criterion is satisfied.
 Expectation
Maximization: used in large scale collaborative filtering and demonstrated
experimentation on Google News.
The EM algorithm is an efficient iterative procedure
to compute the Maximum Likelihood (ML) estimate in the presence of missing or
hidden data…
Each iteration of the EM algorithm consists of two
processes: The Estep, and the Mstep. In the expectation, or Estep, the
missing data are estimated
given the observed data and current estimate of the
model parameters. This is achieved using the conditional expectation, explaining
the choice of terminology. In the Mstep, the likelihood function is maximized
under the assumption that the missing data are known. The estimate of the
missing data from the Estep are used in lieu of the actual missing
data.
 Distributed
non negative matrix factorization: a rescaled version of gradient
descent as tested on Internet Explorer ‘Suggested Sites’ function. In the distributed technique leveraging NMF, the data is partitioned along the long dimension
for parallelism.
Assuming
all matrices can be held in shared memory, multiplicative update rules are
applied to determine step sizes for each parameter.
 Stochastic
Gradient Descent (SGD): a well know technique which tends to compute direction
of steepest descent and then takes a step in that direction. Among the variants
include:
 Partitioned SGD: distribute without using stratification and run
independently and in parallel on partitions
 Pipelined SGD: based on ‘delayed update’ scheme
 Decentralized SGD: computation in decentralized and distributed fashion
Beyond
MapReduce as built in Apache Hadoop, there are a few other tools/frameworks/libraries
which are available in Hadoop ecosystem for implementing matrix factorization
techniques:
 Apache Mahout
Machine
Learning library which contains matrix factorization algorithm for collaborative
filtering implemented on top of Apache Hadoop using the map/reduce paradigm.
 Apache Giraph
Open source
graph processing framework inspired by Google Pregel which follows the bulk
synchronous parallel model and launched typically as Hadoop job.
 Ricardo
A scalable platform for deep analytics. Ricardo combines
the data management capabilities of Hadoop and Jaql with the statistical
functionality provided by R.
Source: Data Intensive Analytics with Hadoop: A Look Inside (refer
slide 33 onwards)
There are
further advances being made in the various projects in the Apache Hadoop
ecosystem. A recent incubator proposal for MRQL project also aims to leverage the BSP model of Apache Hama
to compute matrix functions.
Most, if
not all, techniques listed above can be effectively run in a distributed
environment using MapReduce and Hadoop only also. The same has been strongly
argued in a paper by Jimmy Lin.
‘MapReduce only’ approach may suffer from a few limitations like launch
overheads and may tend to be categorized batch only. However, it still may
score in front of multiple tools architecture in terms of avoiding integration
overheads. With a strong focus on low latency products and advances being made
in analytic accelerators, there is a possibility we may see more on matrix
factorization and collaborative filtering in coming months.

top image source: wikipedia

top image source: wikipedia
Comments
Post a Comment