28.9.2007 HIIT Seminar: Samuel Kaski

HIIT seminars in fall 2007 will be held in hall **B222** of Exactum,
on Fridays starting at 10:15 a.m. Coffee available from 10.

Sep 28:
  Samuel Kaski
  Relevance in Data Fusion and Visualization

High-throughput measurement data, and data banks in which they are stored, have brought a new data analysis problem to biology and medicine: how to infer the relevant effects from the data mass. Each single information source, be it gene expression, protein interaction, or Gene Ontology, contains unknown amounts and types of noise, that is, irrelevant or uninteresting variation. The task of distinguishing between relevant and irrelevant variation is particularly hard in the initial exploratory task of "looking at the data," when the hypotheses are still vague and hence there are no strong models yet to help constrain the exploration. I will discuss how to make sense of data masses with information visualization methods, and machine learning methods designed to bring out relevant clusters and components by fusing several data sources. Supervised mining or "supervised unsupervised learning" searches for clusters or components relevant or informative of classes such as gene ontology. Methods such as learning metrics, discriminative clustering, and discriminative components follow this principle. Mutual dependency mining separates task or source-specific variation from variation shared by all sources. The treatment-specific variation is less relevant in defining yeast stress response, for instance. Methods include local and non-parametric dependent components. Recently we have studied a related problem: How to use only partially relevant auxiliary data sources or background data in supporting inference on too few data samples.

More information at http://www.cis.hut.fi/projects/mi

Last updated on 25 Sep 2007 by Teija Kujala - Page created on 28 Sep 2007 by Teija Kujala