Variational Extensions to EM and Multinomial PCA
Helsinki Inst. of Information Technology
HIIT, P.O. Box 9800
FIN-02015 HUT, Finland
To appear in ECML 2002.
Several authors in recent years have proposed discrete analogues to
principle component analysis intended to handle discrete or positive
only data, for instance suited to analyzing sets of documents.
Methods include non-negative matrix factorization, probabilistic
latent semantic analysis, and latent Dirichlet allocation. This paper%
begins with a review of the basic theory of the variational extension
to the expectation-maximization algorithm, and then presents discrete
component finding algorithms in that light. Experiments are conducted
on both bigram word data and document bag-of-word to expose some of
the subtleties of this new class of algorithms.
Last modified: Fri May 31 11:50:30 EEST 2002