M.Sc. Jussi Määttä defended his doctoral thesis Model Selection Methods for Linear Regression and Phylogenetic Reconstruction on Friday the 27th of May 2016 at the University of Helsinki. His opponent was Professor Ivo Grosse (Martin Luther University of Halle-Wittenberg, Germany). The thesis supervisor is Asst Prof Teemu Roos.
Left: The opponent (on the left) giving his opening statement. Right: The opponent and the candidate (on the right) arguing about technical assumptions. (Photos: Teemu Roos)
Model Selection Methods for Linear Regression and Phylogenetic Reconstruction
Model selection is the task of selecting from a collection of alternative explanations (often probabilistic models) the one that is best suited for a given data set. This thesis studies model selection methods for two domains, linear regression and phylogenetic reconstruction, focusing particularly on situations where the amount of data available is either small or very large.
In linear regression, the thesis concentrates on sequential methods for selecting a subset of the variables present in the data. A major result presented in the thesis is a proof that the Sequentially Normalized Least Squares (SNLS) method is consistent, that is, if the correct answer (i.e., the so-called true model) exists, then the method will find it with probability that approaches one as the amount of data increases. The thesis also introduces a new sequential model selection method that is an intermediate form between SNLS and the Predictive Least Squares (PLS) method. In addition, the thesis shows how these methods may be used to enhance a novel algorithm for removing noise from images.
For phylogenetic reconstruction, that is, the task of inferring ancestral relations from genetic data, the thesis concentrates on the Maximum Parsimony (MP) approach that tries to find the phylogeny (family tree) which minimizes the number of evolutionary changes required. The thesis provides values for various numerical indicators that can be used to assess how much confidence may be put in the phylogeny reconstructed by MP in various situations where the amount of data is small. These values were obtained by large-scale simulations and they highlight the fact that the vast number of possible phylogenies necessitates a sufficiently large data set. The thesis also extends the so-called skewness test, which is closely related to MP and can be used to reject the hypothesis that a data set is random, possibly indicating the presence of phylogenetic structure.
Availability of the dissertation
An electronic version of the doctoral dissertation is available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-951-51-2150-9.
Last updated on 31 May 2016 by Teemu Roos - Page created on 31 May 2016 by Teemu Roos