HIIT Kumpula Seminar: Two Applications of Stochastic Complexity

Lecturer : 
Ciprian Giurcaneanu
Event type: 
HIIT seminar
Doctoral dissertation
Respondent: 
Opponent: 
Custos: 
Event time: 
2016-03-11 10:15 to 11:00
Place: 
Exactum B119
Description: 

Abstract:

Stochastic Complexity (SC) was introduced by Rissanen in 1978 and since then various forms of it have been derived. According to the Minimum Description Length principle, SC is defined in the context of transmitting the existing data to a decoder. The “encoding” is performed by using mathematical models that belong to a predefined class, and the model which leads to the shortest code length is deemed to be the most suitable for describing the data. In this talk, we discuss two recent applications of SC. The first one is a difficult biological problem which concerns identification of the “correct” evolutionary tree. The results were obtained by the author together with Cenanning Li and Dr. Peter J. Waddell. The second application is a joint work with Said Maanan and Prof. Bogdan Dumitrescu, in which we employ a novel SC-criterion for selecting the order of vector autoregressive processes. We pay a special attention to models for which the inverse spectral density matrix (ISDM) has a specific sparsity pattern. The interest on these models is motivated by the relationship between sparse structure of ISDM and the problem of inferring the conditional independence graph for multivariate time series.

Bio:

CD Giurcaneanu received the Ph.D. degree (with commendation) from Tampere University of Technology (TUT), Finland, in 2001. From 1993 to 1997, he was a Junior Assistant at "Politehnica" University of Bucharest. In 1997 he joined TUT where he spent more than 14 years as a Researcher, Senior Researcher and Academy Research Fellow in the Department of Signal Processing. From January 2012 to June 2012 he was with Helsinki Institute for Information Technology (HIIT), and in July 2012 he joined the Department of Statistics, University of Auckland, where he is currently a Senior Lecturer. His research is mainly focused on stochastic complexity and its applications.

HIIT Open 2016 - Programming Contest

Event type: 
Event
Event time: 
2016-05-28 10:00 to 15:00
Place: 
Otaniemi, Espoo
Description: 

Can you solve algorithmic programming challenges, efficiently and correctly, in practice, as a team, under time pressure?

The contest is open to everyone, including university students, high-school students, and teams from companies.

See the contest web page for more information.

#DHH16 Helsinki Digital Humanities Hackathon 2016

Lecturer : 
Event type: 
Event
Doctoral dissertation
Respondent: 
Opponent: 
Custos: 
Event time: 
2016-05-16 09:00 to 2016-05-20 17:00
Place: 
Minerva Square, Siltavuorenpenger 5 A, University of Helsinki
Description: 

This course aims to bring together students and researchers of humanities, social sciences and computer science, for a week of active co-operation in groups under the heading of Digital Humanities.

Digital Humanities, as understood here, is the use of computer science to aid research in the humanities and social sciences (e.g. in fields like linguistics, literature, art, culture, history, sociology, and language philosophy). Currently, data of interest to researchers in the humanities is increasingly available in digital form. However, often the tools and understanding needed to turn that data into relevant conclusions are still lacking.

Here, collaboration across disciplines is essential. People in the humanities and social sciences have an in-depth understanding of their field, and are able to pose challenging research questions that could in theory be answered by digital collections. Computer scientists on the other hand are needed to solve the complex theoretical, algorithm and tool development challenges that currently stand in the way of such research.

The idea of this hackathon is to offer students and researchers from different backgrounds an opportunity to approach digital humanities through hands-on practice.

Sampling from scarcely defined distributions: Methods and applications in data mining

Lecturer : 
Event type: 
Doctoral dissertation
Doctoral dissertation
Respondent: 
Aleksi Kallio
Opponent: 
Dr. Pauli Miettinen, Max-Planck-Institut für Informatik, Germany
Custos: 
Professor Aristides Gionis
Event time: 
2016-02-19 12:00 to 14:00
Place: 
T2 lecture hall, Konemiehentie 2, 02150, Espoo, FI
Description: 

Aleksi Kallio, M.Sc., will defend the dissertation "Sampling from scarcely defined distributions: Methods and applications in data mining" on 19.2. at 12 noon in Aalto University School of Science, lecture hall T2, Konemiehentie 2, Espoo.

Reliability and reproducibility of discoveries is essential for scientific progress. In his dissertation, Aleksi Kallio, M.Sc., studied difficult cases of scientific data analytics and developed new methods and approaches to assess the statistical significance of discoveries. Improved methods are needed due to rapidly growing volumes of data and more complex analytical questions that are faced in modern research.
 
The dissertation introduces the term scarcely defined distributions to describe difficult statistical distributions that are common in modern data analytics. The dissertation discusses methods and applications of data mining, in which scarcely defined distributions emerge. Several strategies are put forth that allow to analyze complex datasets. Applications are reviewed from several fields, including bioinformatics, paleontology and ecology. A common factor for the application areas is the complexity of the underlying processes and error sources.
 
The work concludes that development of new and flexible analytical methods is crucial for all fields that desire to use data to support decision making and prediction. If testing for significance and reliability is not on par with the rest of the data processing machinery then the future of data driven discovery will be plagued with false interpretations. The applicability of the research extends beyond the fields that were discussed. The generic methods and approaches can be adopted to many use cases where complex data sources are relevant, including major social questions related to medicine, climate and social networks.
 
 
Opponent: Dr. Pauli Miettinen, Max-Planck-Institut für Informatik, Germany 
 
Custos: Professor Aristides Gionis, Aalto University School of Science, Department of Computer Science
 
 
School of Science, electronic dissertations: https://aaltodoc.aalto.fi/handle/123456789/52 
 
 
 

HIIT Guest Lecture: Caroline Colijn and Jakub Truszkowski

Lecturer : 
Caroline Colijn and Jakub Truszkowski
Event type: 
Guest lecture
Event time: 
2016-05-18 14:15 to 16:00
Place: 
Exactum B119
Description: 

Caroline Colijn: Informative comparisons between phylogenetic trees

@ 14:15

Abstract: There is increasing interest in using phylogenetic trees to infer evolutionary and epidemiological processes. Indeed, understanding what processes give rise to the patterns of diversity and ancestry we observe is a central question in evolutionary biology. In the absence of explicit likelihood models for a tree under a given process, it is natural to turn to likelihood-free methods such as ABC to connect models to observations. However, this presents some significant challenges, including developing informative summary statistics and appropriate, scalable tools for tree comparisons. The task is made more challenging by the fact that the individual taxa or species in a stochastic simulation model are random, and do not map individually to observed species. So to date, coarse summary measures such as tree imbalance and overall diversity have typically been used. But coarse summaries may not discriminate between evolutionary models very well. Here, I will introduce a metric (in the sense of a true distance function) on the space of unlabelled tree shapes, and illustrate its ability to discriminate between generative models. I will also describe a suite of informative summary statistics. Together, these tools set the stage for improved ABC inference of evolutionary and epidemiological processes from phylogenetic trees.

Bio: My work is at the interface of mathematics and the epidemiology and evolution of pathogens. I hold an EPSRC Fellowship with the broad aim to develop the mathematical tools to connect sequence data for pathogens to pathogen ecology. I also have a long-standing interest on the dynamics of diverse interacting pathogens. For example, how does the interplay between co-infection, competition and selection drive the development of antimicrobial resistance? To answer these questions, my group is building new approaches to analysing phylogenetic trees derived from pathogen sequence data, studying tree space and branching processes, and doing ecological and epidemiological modelling.

Jakub Truszkowski: Fast algorithms for phylogenetic reconstruction and inferring somatic evolution from single-cell sequencing

@ ~ 15:15

Abstract: Advances in sequencing technology are creating new challenges and opportunities for phylogenetics. The falling cost of sequencing has increased the amount of data available to biologists. Large sequence alignments can now contain up to hundreds of thousands of sequences, making traditional tree building methods, such as Neighbor Joining, computationally prohibitive. In parallel to this, new data types, such as single-cell sequencing data, are creating a need for novel analysis methods.

In this talk, I will present two principled algorithms that intend to address these challenges. I will first talk about LSHTree, the first sub-quadratic time phylogenetic reconstruction algorithm with mathematical accuracy guarantees under a Markov model of sequence evolution. Our new algorithm runs in O(n^{1+\gamma(g)} log^2 n) time, where \gamma is an increasing function of an upper bound on the mutation rate along any branch in the phylogeny, and \gamma(g) < 1 for all g. This is achieved by using hashing techniques to quickly identify closely related sequences. For phylogenies with very short branches, the running time of our algorithm is close to linear. In experiments, our implementation is more accurate than the current fast algorithms, while being comparably fast.

In the second part of the talk, I will present our current work on building cell lineage trees from single-cell sequencing data. Single-cell sequencing aims to survey the genomic heterogeneity of cells within an organism. This problem differs from standard phylogenetic reconstruction problems due to very low mutation rates and high sequencing error rates resulting from allelic dropout. We have developed a method for reconstructing cell evolutionary histories while accounting for the high rate of sequencing errors. The problem of inferring the most likely history can be reduced to finding a series of graph cuts in a certain graph. In simulations, we show that our method outperforms standard phylogenetic methods for this task. Initial results on real data sets are promising.

Bio: Jakub Truszkowski is a postdoctoral fellow at the European Bioinformatics Institute and Cancer Research UK Cambridge Institute. He holds a Ph.D. in computer science from the University of Waterloo, Canada, and an M.Sc. from Gdansk University of Technology, Poland. His research focuses on scalable algorithms for problems in phylogenetics and sequence analysis.

Pages