Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
Alexander G. Gray
Carnegie Mellon University
agray@cs.cmu.edu
Bernd Fischer and Johann Schumann
RIACS / NASA Ames
{fisch,schumann}@email.arc.nasa.gov
Wray Buntine
Helsinki Inst. of Information Technology
wray.buntine@hiit.fi
Submitted
Postscript draft.
PDF draft.
Expect significant revisions!
Abstract:
Machine learning has reached a point where most probabilistic methods
can be understood as variations, extensions and combinations of a much
smaller set of abstract themes, e.g., as different instances of the EM
algorithm. This enables the systematic derivation of algorithms
customized for different models. Here, we demonstrate the AutoBayes
system which takes a high-level statistical model specification, uses
powerful symbolic techniques based on schema-based program synthesis
and computer algebra to derive an efficient specialized algorithm for
learning that model, and generates executable code implementing that
algorithm. This capability is far beyond that of code collections
such as Matlab toolboxes or even tools for model-independent
optimization such as BUGS for Gibbs sampling: complex new algorithms
can be generated without new programming, algorithms can be highly
specialized and tightly crafted for the exact structure of the model
and data, and efficient and commented code can be generated for
different languages or systems. We present examples of
automatically-derived algorithms ranging from closed-form solutions of
Bayesian textbook problems to recently-proposed EM algorithms for
clustering, regression, and a multinomial form of PCA.
Wray Buntine
Last modified: Wed Jul 10 10:13:07 EEST 2002