14.11.2007: Guest lecture by Dimitris Mavroeidis

Dimitris Mavroeidis from Athens University of Economics and Business,
Greece, will give second talk

Wednesday Nov 14, 10:15 o'clock, in C222:

Stability based Sparse LSI/PCA, with extensions to K-Means and Spectral Clustering.


The stability of sample based algorithms is a concept commonly used for parameter tuning and validity assessment. In this presentation we focus on two well studied algorithms, LSI and PCA, and propose a feature selection process that provably guarantees the stability of their outputs. The feature selection process is performed such that the level of (statistical) accuracy of the LSI/PCA input matrices is adequate for computing meaningful (stable) eigenvectors. The feature selection process ``sparsifies" LSI/PCA, resulting in the projection of the instances on the eigenvectors of a principal submatrix of the original input matrix, thus producing sparse factor loadings that are linear combinations solely of the selected features. We utilize bootstrapping confidence intervals for assessing the statistical accuracy of the input sample matrices, and matrix perturbation theory in order to relate the statistical accuracy to the stability of eigenvectors. Finally, we demonstrate the proposed methodological approach can be extended to handle K-Means and Spectral Clustering.


Last updated on 17 Dec 2007 by Martti Mäntylä - Page created on 14 Nov 2007 by Teija Kujala