Multi-source Probabilistic Inference

The Multi-source Probabilistic Inference group develops probabilistic machine learning models and inference techniques for analyzing and understanding complex heterogeneous data collections. For most data analysis tasks it is beneficial to jointly analyze all available data, but  often the different data sources are not directly commensurable. For example, a data scientist studying demographics of a neighborhood might have static spatial information about the buildings, dynamic group-level information on public transportation, large collections of time-stamped and user-specific social media content both as text as images, and perhaps even some interview questionnaires. All of these sources provide information on the demographics, but standard modeling tools do not help much in providing an overall picture.

The goal of this research group is to overcome the theoretical and practical challenges needed for integrating such heterogeneous data sources, by building statistical models for various types of data and especially hierarchical models for joint analysis of them even in cases where there are no obvious ways of linking the sources with each other.

Research themes

  • Multi-view learning, data integration, cross-domain object matching
  • Approximative Bayesian inference; MCMC, variational approximation
  • Nonparametric Bayesian modeling
  • Scalable probabilistic models, probabilistic programming


  • Traces of Information: Intelligence from fragmented data (Academy of Finland, 2013-2019)
  • Scalable Probabilistic Analytics (Tekes, 2016-2017)
  • ILCIS: Improved Learning by Combining Information Sources (Xerox Foundation, 2013-2015)


  • Arto Klami, PhD, Principal investigator, Academy Research Fellow
  • Aditya Jitta, Doctoral student
  • Joseph Sakaya, Doctoral student
  • Ville Hyvönen, Doctoral student (jointly supervised with Teemu Roos)
  • Jarkko Lagus, Doctoral student
  • Krista Longi, Doctoral student
  • Sandeep Panchamukhi, Research assistant

Alumni (MSc+)

  • Liye He, MSc
  • Johannes Sirola, MSc

Selected recent publications

  • Probabilistic size-constrained microclustering. Arto Klami and Aditya Jitta. In Proceedings of Uncertainty in Artificial Intelligence (UAI), 2016. [pdf]
  • Using regression makes extraction of shared variation in multiple datasets easy. Jussi Korpela, Andreas Henelius, Lauri Ahonen, Arto Klami, and Kai Puolamäki. Data Mining and Knowledge Discovery, 2016. [html]
  • Towards brain-activity-controlled information retrieval: Decoding image relevance from MEG signals. Jukka-Pekka Kauppi, Melih Kandemir, Veli-Matti Saarinen, Lotta Hirvenkari, Lauri Parkkonen, Arto Klami, Riitta Hari, and Samuel Kaski. Neuroimage, 2015. [doi]
  • Group factor analysis. Arto Klami, Seppo Virtanen, Eemeli Leppäaho, and Samuel Kaski. IEEE Transactions in Neural Networks and Learning Systems, 2015. [preprint]
  • Latent-feature regression for multivariate count data. Arto Klami, Johannes Sirola, Lauri Väre, Abhishek Tripathi, and Frederic Roulland. In Proceedings of Artificial Intelligence and Statistics, 2015. [pdf]
  • Group-sparse embeddings in collective matrix factorization. Arto Klami, Guillaume Bouchard, and Abhishek Tripathi. In Proceedings of International Conference on Learning Representations, 2014. []
  • Bayesian object matching. Arto Klami. Machine Learning, 92(2):225-250, 2013. [doi:10.1007/s10994-013-5357-4, preprint]
  • Bayesian canonical correlation analysis. Arto Klami, Seppo Virtanen, and Samuel Kaski. Journal of Machine Learning Research, 14:965-1003, 2013. [pdf]

Last updated on 19 Oct 2016 by Arto Klami - Page created on 21 Jan 2014 by Arto Klami