Fri Mar 2
Arto Klami
Generative models that discover dependencies between data sets

We study a kind of data fusion problem where the aim is to find dependencies between two (or in general more) data sets with co-occurring paired samples. The underlying motivation is that if several measurements have been tailored to measure the same phenomenon from different views then what is in common between the measurements is interesting. Variation that occurs only in one of the data sets is assumed noise in this context, and when working with small data sets we want to avoid modeling that.

Traditionally dependencies are sought by explicitly maximizing a dependency measure, such as correlation or mutual information. This kind of approaches, however, are known to overfit seriously, and consequently rather simple models need to be used. Recently a probabilistic interpretation of canonical correlation analysis (CCA) was presented, opening way for more robust methods for the same task. In this talk I will present an extended version of the original generative model of CCA, describe a clustering model using the same underlying principle, and discuss a fully Bayesian variant of CCA.


