History

Jan. 9

Unsupervised Machine Learning for Matrix Decomposition

Abstract: Unsupervised learning is a classical approach in pattern recognition and data analysis. Its importance is growing today, due to the increasing data volumes and the difficulty of obtaining statistically sufficient amounts of labelled training data. Typical analysis techniques using unsupervised learning are principal component analysis, independent component analysis, and cluster analysis. They can all be presented as decompositions of the data matrix containing the unlabeled  samples. Starting from the classical results, the author reviews some advances in the field up to the present day.

Speaker: Erkki Oja

Affiliation: Professor Emeritus, Aalto University

Place of Seminar: Aalto University

Slides


Jan. 16

Probabilistic Programming: Bayesian Modeling Made Easy

Abstract: Probabilistic models are principled tools for understanding data, but difficulty of inference limits the complexity of models we can actually use. Often we need to develop specific inference algorithms for new models (which might take months), and need to restrict ourselves to tractable model families that might not match our beliefs about the data. Probabilistic programming promises to fix this, by separating the model description from the inference: With probabilistic programming languages we can specify complex models using a high-level programming language, letting a black-box inference engine take care of the tricky details. This talk covers the basic idea of probabilistic programming and discusses how well its promises hold now and in the future.

Speaker: Arto Klami

Affiliation: Academy Research Fellow, University of Helsinki

Place of Seminar: University of Helsinki

Slides


Jan. 23

Metabolite Identification Through Machine Learning

Abstract: Identification of small molecules from biological samples remains a major bottleneck in understanding the inner working of biological cells and their environment. Machine learning on data from large public databases of tandem mass spectrometric data has transformed this field in recent years, witnessing an increase of identification rates by 150%. In this presentation, I will outline the key machine learning methods behind this development: kernel-based learning of molecular fingerprints, multiple kernel learning, structured prediction as well as some recent advances.

Speaker: Juho Rousu

Affiliation: Associate Professor, Aalto University

Place of Seminar: Aalto University

Slides


Jan. 30

Likelihood-free Inference and Predictions for Computational Epidemiology

Abstract: Simulator-based models often allow inference and predictions under more realistic assumptions than those employed in standard statistical models. For example, the observation model for an underlying stochastic process can be more freely chosen to reflect the characteristics of the data gathering procedure. A major obstacle for such models is the intractability of the likelihood, which has to a large extent hampered their practical applicability. I will discuss recent advances in likelihood-free inference that greatly accelerate the model fitting process by exploiting a combination of machine learning techniques. Applications to several novel models in infectious disease epidemiology are used to illustrate the potential offered by this approach.

Speaker: Jukka Corander

Affiliation: Professor, University of Helsinki and University of Oslo

Place of Seminar: University of Helsinki

Slides


Feb. 6

Towards Perfect Density Estimation

Abstract: We start by addressing a most simple problem, estimation of a one dimensional density function, and argue that despite of the apparent simplicity of the problem, it is surprisingly difficult to solve it in a holistic manner that is both computationally feasible and theoretically justifiable without strong distributional or other assumptions. We demonstrate how the information-theoretic MDL framework can be used for reaching this goal (almost) perfectly, and show how this simple setup gives interesting perspectives on the fundamental concepts in probabilistic modelling and statistical inference. We also discuss ideas for extending the framework to more complex models with additional practical applications.

Speaker: Petri Myllymäki

Affiliation: Professor, University of Helsinki

Place of Seminar: Aalto University

Slides


Feb. 13

Variable Selection From Summary Statistics

Abstract: With increasing capabilities to measure a massive number of variables, efficient variable selection methods are needed to improve our understanding of the underlying data generating processes. This is evident, for example, in human genomics, where genomic regions showing association to a disease may contain thousands of highly correlated variants, while we expect that only a small number of them are truly involved in the disease process. I outline recent ideas that have made variable selection practical in human genomics and demonstrate them through our experiences with the FINEMAP algorithm (Benner et al. 2016, Bioinformatics).

(1) Compressing data to light-weight summaries to avoid logistics and privacy concerns related to complete data sharing and to minimize the computational overhead.

(2) Efficient implementation of sparsity assumptions.

(3) Efficient stochastic search algorithms.

(4) Use of public reference databases to complement the available summary statistics.

Speaker: Matti Pirinen

Affiliation: Academy Research Fellow, Institute for Molecular Medicine Finland, University of Helsinki

Place of Seminar: University of Helsinki


Feb. 20

Compressed Sensing for Semi-Supervised Learning From Big Data Over Networks

Abstract: In this talk I will present some of our most recent work on the application of compressed sensing to semi-supervised learning from massive network-structured datasets, i.e., big data over networks. We expect the user of compressed sensing ideas to be game-changing for machine learning from big data in a similar manner as it was for digital signal processing. In particular, I will present a sparse label propagation algorithm which efficiently learn from large amounts of network-structured unlabeled data by leveraging the information provided by a few initially labelled training data points. This algorithm is inspired by compressed sensing recovery methods and allows for a simple sufficient condition on the network structure which guarantees accurate learning.

Speaker: Alexander Jung

Affiliation: Assistant Professor, Aalto University

Place of Seminar: Aalto University

Slides

 


Last updated on 20 Feb 2017 by Homayun Afrabandpey - Page created on 12 Dec 2016 by Homayun Afrabandpey