Combinatorial Algorithms with Applications in Learning Graphical Models

Lecturer : 
Juho-Kustaa Kangas
Event type: 
Doctoral dissertation
Doctoral dissertation
Respondent: 
Juho-Kustaa Kangas
Opponent: 
James Cussens, Senior Lecturer, University of York
Custos: 
Professor Petri Myllymäki, University of Helsinki
Event time: 
2016-12-09 14:00 to 17:00
Place: 
Exactum, Auditorium CK112, Gustaf Hällströmin katu 2b, Helsinki
Description: 

Abstract

Graphical models are a framework for representing joint distributions over random variables. By capturing the structure of conditional independencies between the variables, a graphical model can express the distribution in a concise factored form that is often efficient to store and reason about.

As constructing graphical models by hand is often infeasible, a lot of work has been devoted to learning them automatically from observational data. Of particular interest is the so-called structure learning problem, of finding a graph that encodes the structure of probabilistic dependencies. Once the learner has decided what constitutes a good fit to the data, the task of finding optimal structures typically involves solving an NP-hard problem of combinatorial optimization. While first algorithms for structure learning thus
resorted to local search, there has been a growing interest in solving the problem to a global optimum. Indeed, during the past decade multiple exact algorithms have been proposed that are guaranteed to find optimal structures for the family of Bayesian networks, while first steps have been taken for the family of decomposable graphical models.

This thesis presents combinatorial algorithms and analytical results with applications in the structure learning problem. For decomposable models, we present exact algorithms for the so-called full Bayesian approach, which involves not only finding individual structures of good fit but also computing posterior expectations of graph features, either by exact computation or via Monte Carlo methods.

For Bayesian networks, we study the empirical hardness of the structure learning problem, with the aim of being able to predict the running time of various structure learning algorithms on a given problem instance. As a result, we obtain a hybrid algorithm that effectively combines the best-case performance of multiple existing techniques.

Lastly, we study two combinatorial problems of wider interest with relevance in structure learning. First, we present algorithms for counting linear extensions of partially ordered sets, which is required to correct bias in MCMC methods for sampling Bayesian network structures. Second, we give results in the extremal combinatorics of connected vertex sets, whose number bounds the running time of certain algorithms for structure learning and various other problems.

CS Forum: Roger Wattenhofer

Lecturer : 
Roger Wattenhofer
Event type: 
Guest lecture
Event time: 
2016-11-18 14:15 to 15:00
Place: 
T2, CS building, Konemiehentie 2
Description: 

 

Speaker: Prof Roger Wattenhofer
Speaker affiliation: ETH Zurich
Host: Prof Jukka Suomela
Time: 14:15 (coffee at 14:00)
Venue: T2, CS building, Konemiehentie 2

 

 

Cryptocurrencies: Bitcoin, Blockchain & Beyond

Abstract

I will first give a short introduction to the Bitcoin system, explaining some of the basics such as transactions and the blockchain. Then, I will discuss some interesting technical aspects in more detail, regarding the stability, security, and scalability of Bitcoin. In particular, I will discuss Bitcoin's eventual consistency, and the related problem of double spending. I will shed some light into our findings regarding the bankruptcy of MtGox, previously the dominant Bitcoin exchange service. And I will present duplex micropayment channels. Apart from scalability, these channels also guarantee end-to-end security and instant transfers, laying the foundation of a network of payment service providers.

Bio

Roger Wattenhofer is a full professor at the Information Technology and Electrical Engineering Department, ETH Zurich, Switzerland. He received his doctorate in Computer Science in 1998 from
ETH Zurich. From 1999 to 2001 he was in the USA, first at Brown University in Providence, RI, then
at Microsoft Research in Redmond, WA. He then returned to ETH Zurich, originally as an assistant
professor at the Computer Science Department.

Roger Wattenhofer's research interests are a variety of algorithmic and systems aspects in computer science and information technology, currently in particular wireless networks, wide area networks, mobile systems, social networks, and physical algorithms. He publishes in different communities: distributed computing (e.g., PODC, SPAA, DISC), networking (e.g., SIGCOMM, MobiCom, SenSys), or theory (e.g., STOC, FOCS, SODA, ICALP). He recently published the book "The Science of the Blockchain". A complete CV is available here.

CS Forum: Thomas Brouwer

Lecturer : 
Thomas Brouwer
Event type: 
Guest lecture
Event time: 
2016-11-07 12:15 to 13:00
Place: 
T6, CS building, Konemiehentie 2
Description: 

Speaker: Thomas Brouwer
Speaker affiliation: University of Cambridge, UK
Host: Prof Samuel Kaski
Time: 12:15 (coffee at 12:00)
Venue: T6, CS building, Konemiehentie 2

 


Bayesian data integration by multiple matrix tri-factorisation

Abstract

The amount of biological -omics data has increased dramatically in recent years, allowing us to better understand biological processes. One of the main challenges now is to integrate these different datasets and draw meaningful conclusions. One way to do this is by using a family of algorithms called matrix factorisation. These methods aim to extract hidden patterns from a matrix by decomposing it into smaller matrices. By sharing these so-called latent factors, we can jointly study multiple datasets.

In this talk I will present our own model for data integration, based on Bayesian non-negative matrix tri-factorisation. This approach allows us to share more latent information than competing methods. We demonstrate our model on multiple drug sensitivity datasets, and showcase better predictive power in cross-validation. Furthermore we will consider the problem of integrating gene expression and methylation data.

Bio

Thomas Brouwer is a PhD student under Pietro Lio' in machine learning and bioinformatics at the Computer Laboratory, University of Cambridge, where he also obtained his BA in Computer Science in 2014. His research is focused on developing Bayesian probabilistic models for analysing and integrating biological datasets, mainly using matrix factorisation methods. He focuses on drug development datasets, in particular for drug combinations, repositioning, and sensitivity prediction.

Lower bounds in distributed computing

Lecturer : 
Juho Hirvonen
Event type: 
Doctoral dissertation
Doctoral dissertation
Respondent: 
Juho Hirvonen
Opponent: 
Professor Michael Elkin, Ben-Gurion University of Negev, Israel
Custos: 
Professor Jukka Suomela, Aalto University School of Science, Department of Computer Science
Event time: 
2016-11-25 10:00 to 13:00
Place: 
lecture hall T2, Konemiehentie 2, Espoo
Description: 

Counting, clocking and colouring: Fault-tolerant distributed coordination

Lecturer : 
Joel Rybicki
Event type: 
Doctoral dissertation
Doctoral dissertation
Respondent: 
Joel Rybicki
Opponent: 
Professor Roger Wattenhofer, ETH Zurich, Switzerland
Custos: 
Professor Jukka Suomela, Aalto University School of Science, Department of Computer Science
Event time: 
2016-11-19 12:00 to 15:00
Place: 
lecture hall SOK A301 of the Aalto University Töölö Campus Main building, Runeberginkatu 14-16, Helsinki
Description: 

Abstract

This thesis studies fault-tolerant co-ordination and synchronisation in distributed systems. Such systems are prevalent in computing today: they range from large-scale communications networks to integrated hardware circuits. In essence, a distributed system consists of independent computational entities called nodes that solve some task together. In such a system, a common notion of time is critical for ensuring correct co-ordination and scheduling of actions.

However, distributed systems are prone to scenarios in which the system may experience arbitrary transient failures, for example, due to various external disruptions. During a transient failure, the system ceases to operate correctly for some time which may put the system into an inconsistent state or even permanently damage parts of the system. In particular, the nodes in the system can lose synchrony. This thesis proposes synchronisation and scheduling algorithms that efficiently recover from such failures.

The first part of this thesis considers self-stabilising Byzantine fault-tolerant algorithms. This means that we assume a strong adversarial fault model: the system starts from an arbitrary initial state and a large fraction of the nodes may be faulty. The faulty nodes may exhibit arbitrary misbehaviour, and in particular, give inconsistent information to other nodes. In this setting, we study the synchronous counting problem, which is a basic primitive for establishing a so-called digital clock. The task is to guarantee that eventually all correct nodes 

The main contribution of the first part is an array of techniques for designing fast and communication-efficient counting algorithms. In particular, we provide deterministic algorithms with asymptotically optimal stabilisation time and optimal resilience. That is, the algorithms quickly converge to correct behaviour and tolerate the largest possible number of faulty nodes. Compared to prior work, our solutions give an exponential improvement in the number of bits stored and transmitted by each node without resorting to randomisation or suboptimal resilience.

The second part of this thesis investigates computational algorithm design and synthesis techniques. That is, we show how to automatically construct correct algorithms or prove their non-existence using computers. This is particularly attractive in the context of fault-tolerant systems, in which finding correct solutions is often a difficult task. We propose synthesis techniques based on propositional satisfiability solvers that yield novel state-optimal counting algorithms and tight bounds on the exact time complexity of distributed graph colouring.

Pages