### History of Previous Talks in Spring 2017

##### Learning Data Representation by Large-Scale Neighbor Embedding

**Date:** June 12, 2017

**Abstract:** Machine learning, the state-of-the-art data science, has been increasingly influencing our life. Encoding data in a suitable vector space is the fundamental starting point for machine learning. A good vector coding should respect the relations among the data items. However, conventional methods that preserve pairwise or higher order relationship are very slow and consequently they can handle only small-scale data sets. We have been developing a family of unsupervised methods called large-scale Neighbor Embedding (NE) which substantially accelerate the vector coding. Our method can thus learn low-dimensional vector representation for mega-scale data according to their neighborhoods in the original space. With our efficient algorithms and a wealth of neighborhood information, Neighbor Embedding significantly outperforms small-scale NE and many other existing approaches for learning data representation. Besides generic feature extraction, our work also delivers two important tools as special cases of Neighbor Embedding for data visualization and cluster analysis, which scales up these applications by an order of magnitude and enables the current-sized visualization and clustering for interactive use. Because neighborhood information is naturally and massively available in many areas, our method has wide applications as a critical component in scientific research, next-generation DNA sequence analysis, natural language processing, educational cloud, financial data analysis, market studies, etc.

**Speaker:** Zhirong Yang

**Affiliation:** Department of Computer Science, Aalto University

**Place of Seminar:** Aalto University

##### Statistical Ecology with Gaussian Processes

**Date:** June 5, 2017

**Abstract:** Ecology studies the distribution and abundance of species, and their interactions with other species and the environment. Key questions in ecology include what are the environmental factors and interspecies dependencies that drive species distributions, how these processes together affect species community structures and how environmental changes, such as climate change, affect species distribution and species communities. These questions are essentially about variable selection and causal and predictive inference. Hence, statistics has a central role in answering them. The species distribution models (SDMs) used for these analyses are traditionally based on generalized linear and additive models. In this talk I will present how Gaussian processes (GPs) can be used in SDMs and what benefits and challenges this provides. I will present recent results on GP based species distribution modeling in the Baltic Sea and Great Barrier Reef, Australia. I will discuss the potential future development and current challenges related to computation and model building.

**Speaker:** Jarno Vanhatalo

**Affiliation:** Professor of Statistics, University of Helsinki

**Place of Seminar:** University of Helsinki

##### Graphics Meets Vision Meets Machine Learning

**Date:** May 29, 2017

**Abstract:** Realistic three-dimensional modeling and animation are key bottlenecks in the production film, games, VR, and other applications of computer graphics. In this talk, I will describe our recent research that makes use of machine learning techniques for solving hard inference problems for generating 3D content: capture and reproduction of photorealistic surface appearance, facial performance capture, and turning audio into facial animation. These works both push the state of the art forward in research – two of the three projects have been published at ACM SIGGRAPH – and are surprisingly ready for production use already now.

**Speaker:** Jaakko Lehtinen

**Affiliation:** Professor of Computer Science, Aalto University

**Place of Seminar:** Aalto University

##### On Priors and Bayesian Variable Selection in Large p, Small n Regression

**Date:** May 22, 2017

**Abstract:** The Bayesian approach is well known for using priors to improve inference, but equally important part is the integration over the uncertainties. I first present recent development in hierarchical shrinkage priors for presenting sparsity assumptions in covariate effects. I then present a projection predictive variable selection approach, which is a Bayesian decision theoretical approach for variable selection which can preserve the essential information and uncertainties related to all variables in the study. I also present recent excellent experimental results and easy to use software.

**Speaker:** Aki Vehtari

**Affiliation:** Professor of Computer Science, Aalto University

**Place of Seminar:** University of Helsinki

##### Machine Learning for Image-Based Localization

**Date:** May 15, 2017

**Abstract:** Image-based localization refers to a problem where the camera position and orientation for a given query image is computed with respect to a known visual 3D map of the scene. This problem is relevant for applications such as robot self-localization, pedestrian navigation, and augmented reality. Another related problem is the relative pose estimation between two camera views which is required for computing image-based 3D models from a collection of 2D images. Traditionally both of these problems have been approached by using hand-crafted local image features and descriptors, such as the widely used SIFT keypoint detector. However, recently several deep learning based localization approaches have been proposed. They omit local feature matching and directly try to regress the camera pose. In this presentation, we will describe an overview of the problem area and explore some recent deep learning based approaches. We will also present some of our own recent results in this area.

**Speaker:** Juho Kannala

**Affiliation:** Professor of Computer Science, Aalto University

**Place of Seminar:** Aalto University

##### Empirical Parameterization of Exploratory Search Systems Based on Bandit Algorithms

**Date:** May 8, 2017

**Abstract:** Exploratory searches are where a user has insufficient knowledge to define exact search criteria or does not otherwise know what they are looking for. Reinforcement learning techniques have demonstrated great potential for supporting exploratory search in information retrieval systems as they allow the system to trade-off exploration (presenting the user with alternatives topics) and exploitation (moving toward more specific topics). Users of such systems, however, often feel that the system is not responsive to user needs. This problem is not an inherent feature of such systems, but is caused by the exploration rate parameter being inappropriately tuned for a given system, dataset or user. In this talk, we discuss two approaches how to optimise exploratory search systems based on bandit algorithms. First, we show that the tradeoff between exploration and exploitation can be modelled as a direct relationship between the exploration rate parameter from the reinforcement learning algorithm and the number of relevant documents returned to the user over the course of a search session. We define the optimal exploration/exploitation trade-off as where this relationship is maximised and show this point to be broadly concordant with user satisfaction and performance. Our second approach aims to dynamically adapt exploration and exploitation in a manner commensurate with the user’s individual requirements for each search session. We present a novel study design together with a regression model for predicting the optimal exploration rate based on simple metrics from the first iteration, such as clicks and reading time. We perform model selection based on the data collected from a user study and show that predictions are consistent with user feedback.

**Speaker:** Dorota Glowacka

**Affiliation:** Department of Computer Science, University of Helsinki

**Place of Seminar:** University of Helsinki

##### Nintendo Wii Fit-Based Balance Testing to Detect Sleep Deprivation: Approximate Bayesian Computation Approach

**Date:** April 24, 2017

**Abstract:** Sleep deprivation deteriorates health and causes accidents. Measuring a person’s postural steadiness may be used to determine his/hers state of alertness. Posturographic measurements are easy to conduct: a person’s body sway is measured during upright stance on a balance board for 60 s. The Nintendo Wii Fit balance board is a portable and affordable alternative to expensive clinical force plates. Body sway may be modeled with a single-link inverted pendulum (Asai et al. 2009). The model parameters, such as *time delay* and *noise* intensity in the nervous system, are physiologically relevant. The pendulum is kept upright with controllers, that include *stiffness* and *damping* gain parameters. *Level of control* determines how often the active controller is ON. The model cannot be solved analytically in closed form. Therefore, inferring model parameters and their confidence limits is nontrivial. We used sequential Monte Carlo approximate Bayesian computation (SMC-ABC) algorithm to infer the model parameters. The inferred parameters may allow determining a person’s state of alertness.

**Speaker:** Aino Tietäväinen

**Affiliation:** Department of Physics, University of Helsinki

**Place of Seminar:** Aalto University

##### Learning With Spectral Kernels

**Date:** April 10, 2017

**Abstract:** Machine learning algorithms learn models that automatically infer data representations and generalise into new data. Gaussian processes are Bayesian kernel-based models with a key advantage of being able to efficiently learn kernel functions from data. All kernel functions can be decomposed into sinusoidal components, which provide a highly expressive basis for learning arbitrary representations. In this talk I will discuss how we can exploit spectral kernel learning for large-scale multi-task learning. We also generalise spectral learning into learning non-stationary kernels with input-specific behavior.

**Speaker:** Markus Heinonen

**Affiliation:** Department of Computer Science, Aalto University

**Place of Seminar:** University of Helsinki

##### Multilayer Networks

**Date:** April 3, 2017

**Abstract:** Network science has been very successful in investigations of a wide variety of applications from biology and the social sciences to physics, technology, and more. In many situations, it is already insightful to use a simple (and typically naive) representation as a simple, binary graph in which nodes are entities and unweighted edges encapsulate the interactions between those entities. This allows one to use the powerful methods and concepts for example from graph theory, and numerous advances have been made in this way. However, as network science has matured and (especially) as ever more complicated data has become available, it has become increasingly important to develop tools to analyse more complicated structures. For example, many systems that were typically initially studied as simple graphs are now often represented as time-dependent networks, networks with multiple types of connections, or interdependent networks. This has allowed deeper and more realistic analyses of complex networked systems, but it has simultaneously introduced mathematical constructions, jargon, and methodology that are specific to research in each type of system. Recently, the concept of “multilayer networks” was developed in order to unify the aforementioned disparate language (and disparate notation) and to bring together the different generalised network concepts that included layered graphical structures. In this talk, I will introduce multilayer networks and discuss how to study their structure. Generalisations of the clustering coefficient for multiplex networks and graph isomorphism for general multilayer networks are used as illustrative examples.

**Speaker:** Mikko Kivelä

**Affiliation:** Postdoctoral Researcher, Aalto University

**Place of Seminar:** Aalto University

##### Learning to Rank: Applications to Bioinformatics

**Date:** March 27, 2017

**Abstract:** Learning To Rank (LTR) has been developed in information retrieval for ranking documents regarding the relevance to a given query. Typically LTR builds a ranking model from given relevant (or irrelevant) query-document pairs. Generally, in some respect, LTR can be thought as an attempt to solve a multilabel classification problem, where queries are labels. A lot of settings in bioinformatics can be turned into multilabel classification problems having relatively similar properties. One typical example is biomedical document annotation. Currently PubMed, a database of 26 million biomedical citations, has around 30,000 keywords, called MeSH (Medical Subject Headings) terms, i.e. labels in multilabel classification, where the number of articles per MeSH term is extremely diverse, ranging from only 20 to more than eight million. This large, biased dataset already goes beyond the general sense of settings expected by regular multilabel classifiers. In this talk, I will start with introduction and a brief review of LTR. I then raise three bioinformatics multilabel classification problems that share real data-derived, practical properties, which hamper the application of regular multilabel classifiers. Finally I will show that LTR nicely addresses such large-scale, challenging bioinformatics multilabel classification problems.

A large portion of this talk appeared in ISMB in 2015 and 2016.

**Speaker:** Hiroshi Mamitsuka

**Affiliation:** Professor, Kyoto University

**Place of Seminar:** University of Helsinki

##### Future AI: Autonomous machine learning and beyond

**Date:** March 20, 2017

**Abstract:** Many researchers have identified autonomous machine learning (unsupervised, semi-supervised and reinforcement learning) as an important cornerstone of advanced artificial intelligence. The Curious AI Company is developing such autonomous learning systems. We already have state-of-the-art results in several semi-supervised classification tasks but we are also working on bringing autonomy to learning segmentation and hierarchical control, both of them tasks that take a lot of human work when developing for instance self-driving cars. However, we believe there’s an even more important blocker on the way to advanced AI: the fundamental inability of currently used parallel distributed neural coding to properly represent objects and their interactions. We are working on deep learning networks whose neuro-symbolic representations will hopefully allow neural networks to understand the world not only in terms of a collection of features but in terms of objects and their interactions, too. This is necessary for many tasks such as communication, reasoning and complex decision making.

**Speaker:** Harri Valpola

**Affiliation:** CEO of the Curious AI Company

**Place of Seminar:** Aalto University

##### Small Data AUC Estimation of Machine Learning Methods: Pitfalls and Remedies

**Date:** March 13, 2017

**Abstract:** Asking whether two populations can be distinguished from each other is one of the most fundamental questions in data analysis and area under ROC curve (AUC) is one of the simplest and most practical tools for answering it. Also known as the Wilcoxon-Mann-Whitney U statistic, it can be associated with a p-value indicating how likely one would obtain as good AUC value if the two populations would not be stochastically different. Estimating AUC of a predictive model and its statistical significance has a huge practical importance in fields like medicine, where one often has access to only small amounts of labeled data but large number of features. Leave-pair-out cross-validation (LPOCV) is an almost unbiased AUC estimator of machine learning methods that has also been empirically shown to be the most reliable of the cross-validation (CV) based estimators. We further study the properties of LPOCV and show some serious pitfalls one can encounter when estimating AUC with CV and how to avoid them. In particular, we show how one can produce very promising results with high AUC values even if there is no signal in the data. Finally, we show how to counter these risks with new Wilcoxon–Mann–Whitney U type of permutation tests adjusted for LPOCV, thus upgrading one of the classical statistical tools for CV estimates.

**Speaker:** Tapio Pahikkala

**Affiliation:** Assistant Professor, University of Turku

**Place of Seminar:** University of Helsinki

##### Differentially Private Bayesian Learning

**Date:** March 6, 2017

**Abstract:** Many applications of machine learning for example in health care would benefit from methods that can guarantee data subject privacy. Differential privacy has recently emerged as a leading framework for private data analysis. Differenctial privacy guarantees privacy by requiring that the results of an algorithm should not change much even if one data point is changed, thus providing plausible deniability for the data subjects.

In this talk I will present methods for efficient differentially private Bayesian learning. In addition to asymptotic efficiency, we will focus on how to make the methods efficient for moderately-sized data sets. The methods are based on perturbation of sufficient statistics for exponential family models and perturbation of gradients for variational inference. Unlike previous state-of-the-art, our methods can predict drug sensitivity of cancer cell lines using differentially private linear regression with better accuracy than using a very small non-private data set.

**Speaker:** Antti Honkela

**Affiliation:** Assistant Professor, University of Helsinki

**Place of Seminar:** Aalto University

##### Inverse Modeling in Behavioral Sciences and HCI

**Date:** February 27, 2017

**Abstract:** Can one make deep inferences about a person based only on observations of how she acts? I discuss methodology for inverse modeling in behavioral sciences, where the goal is to estimate a cognitive model from limited behavioral data. Given substantial diversity in people’s intentions, strategies and abilities, this is a difficult problem and previously unaddressed. I discuss advances achieved with an approach that combines (1) computational rationality, to predict how a person adapts to a task when her capabilities are known, and (2) Approximate Bayesian Computation (ABC) to estimate those capabilities. The benefit is that model parameters are conditioned on both prior knowledge and observations, which improves model validity and helps identify causes for observations. Inverse modeling methods can advance theory-formation by bringing complex behavior within reach of modeling. This talk is based on on-going collaborations with Antti Kangasraasio, Samuel Kaski, Jukka Corander, Andrew Howes, Kumaripaba Athukorala, Jussi Jokinen, Sayan Sarcar, and Xiangshi Ren.

**Speaker:** Antti Oulasvirta

**Affiliation:** Associate Professor, Aalto University

**Place of Seminar:** University of Helsinki

##### Compressed Sensing for Semi-Supervised Learning From Big Data Over Networks

**Date:** February 20, 2017

**Abstract:** In this talk I will present some of our most recent work on the application of compressed sensing to semi-supervised learning from massive network-structured datasets, i.e., big data over networks. We expect the user of compressed sensing ideas to be game-changing for machine learning from big data in a similar manner as it was for digital signal processing. In particular, I will present a sparse label propagation algorithm which efficiently learn from large amounts of network-structured unlabeled data by leveraging the information provided by a few initially labelled training data points. This algorithm is inspired by compressed sensing recovery methods and allows for a simple sufficient condition on the network structure which guarantees accurate learning.

**Speaker:** Alexander Jung

**Affiliation:** Assistant Professor, Aalto University

**Place of Seminar:** Aalto University

##### Variable Selection From Summary Statistics

**Date:** February 13, 2017

**Abstract:** With increasing capabilities to measure a massive number of variables, efficient variable selection methods are needed to improve our understanding of the underlying data generating processes. This is evident, for example, in human genomics, where genomic regions showing association to a disease may contain thousands of highly correlated variants, while we expect that only a small number of them are truly involved in the disease process. I outline recent ideas that have made variable selection practical in human genomics and demonstrate them through our experiences with the FINEMAP algorithm (Benner et al. 2016, Bioinformatics).

(1) Compressing data to light-weight summaries to avoid logistics and privacy concerns related to complete data sharing and to minimize the computational overhead.

(2) Efficient implementation of sparsity assumptions.

(3) Efficient stochastic search algorithms.

(4) Use of public reference databases to complement the available summary statistics.

**Speaker:** Matti Pirinen

**Affiliation:** Academy Research Fellow, Institute for Molecular Medicine Finland, University of Helsinki

**Place of Seminar:** University of Helsinki

##### Towards Perfect Density Estimation

**Date:** February 6, 2017

**Abstract:** We start by addressing a most simple problem, estimation of a one dimensional density function, and argue that despite of the apparent simplicity of the problem, it is surprisingly difficult to solve it in a holistic manner that is both computationally feasible and theoretically justifiable without strong distributional or other assumptions. We demonstrate how the information-theoretic MDL framework can be used for reaching this goal (almost) perfectly, and show how this simple setup gives interesting perspectives on the fundamental concepts in probabilistic modelling and statistical inference. We also discuss ideas for extending the framework to more complex models with additional practical applications.

**Speaker:** Petri Myllymäki

**Affiliation:** Professor, University of Helsinki

**Place of Seminar:** Aalto University

##### Likelihood-free Inference and Predictions for Computational Epidemiology

**Date:** January 30, 2017

**Abstract:** Simulator-based models often allow inference and predictions under more realistic assumptions than those employed in standard statistical models. For example, the observation model for an underlying stochastic process can be more freely chosen to reflect the characteristics of the data gathering procedure. A major obstacle for such models is the intractability of the likelihood, which has to a large extent hampered their practical applicability. I will discuss recent advances in likelihood-free inference that greatly accelerate the model fitting process by exploiting a combination of machine learning techniques. Applications to several novel models in infectious disease epidemiology are used to illustrate the potential offered by this approach.

**Speaker:** Jukka Corander

**Affiliation:** Professor, University of Helsinki and University of Oslo

**Place of Seminar:** University of Helsinki

##### Metabolite Identification Through Machine Learning

**Date:** January 23, 2017

**Abstract:** Identification of small molecules from biological samples remains a major bottleneck in understanding the inner working of biological cells and their environment. Machine learning on data from large public databases of tandem mass spectrometric data has transformed this field in recent years, witnessing an increase of identification rates by 150%. In this presentation, I will outline the key machine learning methods behind this development: kernel-based learning of molecular fingerprints, multiple kernel learning, structured prediction as well as some recent advances.

**Speaker:** Juho Rousu

**Affiliation:** Associate Professor, Aalto University

**Place of Seminar:** Aalto University

##### Probabilistic Programming: Bayesian Modeling Made Easy

**Date:** January 16, 2017

**Abstract:** Probabilistic models are principled tools for understanding data, but difficulty of inference limits the complexity of models we can actually use. Often we need to develop specific inference algorithms for new models (which might take months), and need to restrict ourselves to tractable model families that might not match our beliefs about the data. Probabilistic programming promises to fix this, by separating the model description from the inference: With probabilistic programming languages we can specify complex models using a high-level programming language, letting a black-box inference engine take care of the tricky details. This talk covers the basic idea of probabilistic programming and discusses how well its promises hold now and in the future.

**Speaker:** Arto Klami

**Affiliation:** Academy Research Fellow, University of Helsinki

**Place of Seminar:** University of Helsinki

##### Unsupervised Machine Learning for Matrix Decomposition

**Date:** January 9, 2017

**Abstract:** Unsupervised learning is a classical approach in pattern recognition and data analysis. Its importance is growing today, due to the increasing data volumes and the difficulty of obtaining statistically sufficient amounts of labelled training data. Typical analysis techniques using unsupervised learning are principal component analysis, independent component analysis, and cluster analysis. They can all be presented as decompositions of the data matrix containing the unlabeled samples. Starting from the classical results, the author reviews some advances in the field up to the present day.

**Speaker:** Erkki Oja

**Affiliation:** Professor Emeritus, Aalto University

**Place of Seminar:** Aalto University