Automatic Derivation of Statistical Algorithms: The EM Family and Beyond

Alexander G. Gray
Carnegie Mellon University
agray@cs.cmu.edu

Bernd Fischer and Johann Schumann
RIACS / NASA Ames
{fisch,schumann}@email.arc.nasa.gov

Wray Buntine
Helsinki Inst. of Information Technology
wray.buntine@hiit.fi

Submitted Postscript draft. PDF draft. Expect significant revisions!

Abstract:

Machine learning has reached a point where most probabilistic methods can be understood as variations, extensions and combinations of a much smaller set of abstract themes, e.g., as different instances of the EM algorithm. This enables the systematic derivation of algorithms customized for different models. Here, we demonstrate the AutoBayes system which takes a high-level statistical model specification, uses powerful symbolic techniques based on schema-based program synthesis and computer algebra to derive an efficient specialized algorithm for learning that model, and generates executable code implementing that algorithm. This capability is far beyond that of code collections such as Matlab toolboxes or even tools for model-independent optimization such as BUGS for Gibbs sampling: complex new algorithms can be generated without new programming, algorithms can be highly specialized and tightly crafted for the exact structure of the model and data, and efficient and commented code can be generated for different languages or systems. We present examples of automatically-derived algorithms ranging from closed-form solutions of Bayesian textbook problems to recently-proposed EM algorithms for clustering, regression, and a multinomial form of PCA.


Wray Buntine
Last modified: Wed Jul 10 10:13:07 EEST 2002