Applying Discrete PCA in Data Analysis
Wray Buntine
Helsinki Inst. of Information Technology
HIIT, P.O. Box 9800
FIN-02015 HUT, Finland
wray.buntine@hiit.fi
Alex Jakulin
Faculty of Computer and Information Science
University of Ljubljana
Trzaska 25, SI-1001, Ljubljana, Slovenia
Jakulin@IEEE.ORG
Appeared in Uncertainty in AI, 2004.
PDF version.
GPL'd test suite for this system now available for trial. Contact authors!
Abstract:
Methods for analysis of principal components in discrete data have
existed for some time under various names such as grade of
membership modelling, probabilistic latent semantic analysis,
and genotype inference with
admixture. In this paper we explore a number of extensions
to the common theory, and present some application of these
methods to some common statistical tasks. We show that these
methods can be interpreted as a discrete version of ICA.
We develop a hierarchical version yielding components
at different levels of detail,
and additional techniques for Gibbs sampling.
We compare the algorithms on
a text prediction task using support vector machines,
and to information retrieval.
Wray Buntine
Last modified: Fri May 28 11:08:57 EEST 2004