Summer 2013 internship topics with HIIT-wide focus area

These topics are based on combining expertise from a number of HIIT groups working in algorithmics, machine learning, bioinformatics and computational systems biology. Together we develop new algorithms and methods that will help solve tomorrow's most advanced bioinformatics challenges.

1. Machine learning for modelling biological pathways

Antti Honkela, Brandon Malone, Sohan Seth

In this summer project, you will refine existing statistical machine learning models and develop novel algorithms for modelling biochemical reaction pathways. Understanding which biochemical reactions are performed by the organisms that live in a specific environment is a fundamental step in ecosystem preservation and engineering. While identifying active reactions is easy, understanding which organisms implement each reaction is still a challenging problem. Currently, biologists approach this problem using ad hoc, suboptimal heuristics. For the first time, we are tackling this problem by adapting rigorous machine learning tools, such as Bayesian modelling, and by exploiting multiple DNA samples from similar environments. Another related topic below will focus on incorporating biological information in the modelling from existing databases.


2. Bioinformatics for modelling biological pathways

Antti Honkela, Sohan Seth, Brandon Malone

In this project, you will explore available public databases and incorporate existing information to design sensible priors for statistical models of biological pathways. The effectiveness of Bayesian learning is greatly influenced by the selection of appropriate prior. An informative prior is beneficial for both better convergence and superior interpretability. Therefore, it is essential that we do better than making a 'best' guess. Another related topic above will deal with building such models.


3. Inferring microbial interaction networks

Antti Honkela, Fabio Cunial

Like human social networks, interactions among bacteria in a complex ecosystem can be represented as directed graphs whose vertices stand for species and whose arcs encode some measure of interaction intensity. Inferring such graphs from DNA samples is a fundamental step for understanding and engineering ecosystems. In this project you will design, implement and test a pipeline of tools to go from multiple DNA samples to the species/sample matrices required for network inference. This pipeline should become a fast and flexible software component that can be adapted to other similar problems we are solving (for example, building the reaction/sample matrices required for functional analysis of bacterial communities).

Last updated on 29 Jan 2013 by Antti Honkela - Page created on 27 Jan 2013 by Antti Honkela