CoSCo Projects

Ongoing Projects

The Finnish Centre of Excellence in Computational Inference Research (COIN)

  • Duration: 1.1.2012-31.12.2017
  • Funding: Academy of Finland
  • Project leaders: Erkki Oja (director), Samuel Kaski (co-director), Ilkka Niemelä, Erik Aurell, Jorma Laaksonen, Jukka Corander, and Petri Myllymäki.
  • Key words: massive data, multiple data sources, highly structured stochastic models, extreme inference.
  • Abstract:

    The Finnish Centre of Excellence in Computational Inference Research (COIN) develops methods for transforming the data produced by the current data revolution into useful information. The key methodology for achieving this goal is statistical and computational inference based on the data. The emphasis is on large data collections and computationally demanding modelling and inference algorithms. Our mission is to push the boundary towards both more complex problems, requiring more sructured data models, and towards extremely rapid inference. COIN brings in expertise on several different approaches to inference, with a unique opportunity to address the core computational challenges with combinations of machine learning, computational statistics, statistical physics, and constraint-based search and optimization.

    We will work on two flagship applications. In the Intelligent Information Access flagship, the challenge is to make use of massive interrelated information sources, whether in everyday life or in science, and select what information to present to the user. The inference needs to be done on-line, learning relevance from the user's responses. In the Computational Biology and Medicine flagship, we develop methods for maximally utilizing the novel measurement databases and structured stochastic models in making data-driven biology cumulative. In addition to these two flagship applications, we work on a few additional test-bench applications in collaboration with selected top-level application partners, from science and industry.

Data to Intelligence (D2I)

  • Duration: 01.02.2012-31.01.2016
  • Funding: TEKES/Digile (one of the Finnish Strategic Centres for Science, Technology and Innovation)
  • Project leader: Petri Myllymäki
  • Key words: Big data, intelligent systems, machine learning, probabilistic models
  • Abstract: The number of devices capable of automatically gathering and storing digital data is increasing fast: our mobile phones, home appliances, digital televisions, cars, industrial process monitoring systems, email clients, web browsers, social media applications, traffic and security cameras, and numerous other sources of digital information produce vast masses of data all the time. Global trend setters like Google, Yahoo, Netflix, Amazon and Autonomy have already shown that it is possible to transform data to economic value by producing novel, immensely popular and profitable services based on intelligent analysis of massive data sets. Nevertheless, new user-centric approaches and cooperative organization networks require ever more intelligent ways to utilize the available data: the necessary information should be available quickly and automatically and be based, e.g., on the current user role, and the context and process requirements and perspectives. This means that information sources are often crossing traditional organization borders and may be utilizing also open data reserves.

    Another problem is caused by the fact that the data is often not only big, but it is also parceled, consisting of potentially several data sources that may contain heterogeneous data types. The nature of this type of data makes it very difficult to retrieve relevant pieces of data or information in a given context, in particular when the links between the different data elements in different data sources are not explicit but implicit, and have to be inferred with the help of the constructed models.

    Furthermore, the data may not only be big (and potentially parceled), but it is often also extremely high-dimensional, which makes it difficult to understand the underlying phenomena. What is needed is a rich toolbox of models, methods and algorithms for representing the information extracted from the raw data in such a manner that the results help the user to understand the domain better, and support decision-making processes by helping in drawing conclusions about future events and in estimating their probabilities.

    The mission of the D2I program is to support the global trend, and contribute to such emerging ecosystems that boost Finnish international competitiveness through intelligent (context-sensitive, personalized, proactive) data processing technologies linked to new data-driven services that add measurable value, leading to increased knowledge, comfort, productivity or effectiveness. The target is reached by developing intelligent methods and tools for managing, refining and utilizing diverse data sources, and by creating new, innovative data-intensive business models and services based on these methods.

Revolution of Knowledge Work (Re:Know)

  • Duration: 01.09.2013-31.12.2014 (1st funding period)
  • Funding: TEKES (as a large strategic initiative)
  • Project leaders: Seven HIIT group leaders, Patrik Floréen in charge
  • Key words: Big data, symbiotic human-computer interfaces
  • Abstract:  We combine the multidisciplinary world-class expertise in machine learning, human-computer interaction, distributed computing, cognitive neuroergonomics and human factors at work, available within Helsinki Institute for Information Technology HIIT and the Finnish Institute of Occupational Health. Our objective is to develop Symbiotic Human-Information Interfaces, which pave the way for a revolution of knowledge work.

    Symbiotic Human-Information Interfaces combine heterogeneous data sources and utilize the context of use and user actions to jointly with the user determine what information is most likely relevant, and provide the user with a new type of interactive and proactive interface to the data. In the context of knowledge work, we use our know-how on both computational principles and how humans process information to develop a new information management and utilization paradigm, enabling humans and computers to support each other optimally. Symbiotic Human-Information Interfaces will revolutionize information seeking and further cultivation into new knowledge.

Modeling of multiple data sources: Integration, visualization, retrieval and interaction (Multivire)

  • Duration: 1.1.2012-31.12.2014
  • Funding: Academy of Finland, as part of the special call within thematic areas of Strategic Centres, theme area: "Methods and applications of data reserves"
  • Project leaders: Samuel Kaski, Petri Myllymäki, Giulio Jacucci and Antti Oulasvirta
  • Key words: Augmented science, data reserves, human-computer interaction, information retrieval, information visualization, machine learning, multi-modal interfaces, probabilistic modeling, ubiquitous computing
  • Abstract: Information technology is being radically changed by the rapid growth of the amount of automatically stored data. As the emerging data sets are additionally increasingly heterogeneous, what is needed now is a new generation of intelligent methods for processing, analyzing and understanding this type of multi-source "big data". We have formed a multi-disciplinary research consortium with the unique advantage that all work in the same institute (HIIT) which has recently started a new strategic focus area "Computationally Supported Collective Science" or "augmented science" with an excellent match to this project. The main objective of the project is to develop methods for modeling multiple data sources. The main research questions are how to integrate multiple, heterogeneous data sources, how to retrieve relevant information from the resulting models, how to represent the extracted information in a form that is understandable and useful to humans, and how to make the whole process interactive so that it can be integrated as an efficient element of different data processing activities. The focus of the project is on basic research and the purpose is to develop generic methods that can be applied in various domains.

Recently Finished Projects

Virtual Intelligent Space for Collaborative Innovation (VISCI)

  • Duration: 1.1.2009-31.12.2012
  • Funding: Academy of Finland (via the Motive Programme) and Tekes
  • Project leaders: Petri Myllymäki and Patrik Floreen
  • Key words: Collaborative interaction and innovation, virtual teams, intelligent information technology, technology-enabled interaction, collaboration and co-creation of knowledge, interdisciplinary research approach
  • Abstract: As education and business today are increasingly globalized, new challenges are facing communication, interaction, and collaboration for the advancement and creation of innovation. There is a clear need for new knowledge about ubiquitous computing, communication, and knowledge co-construction processes in virtual teams as well as about the possibilities of modern information technology to support virtual knowledge co-creation and innovation. The project team will develop and test novel technologies that facilitate social interaction and collaborative knowledge creation for learning and innovation. It will explore how adaptive and personalized technology works as an enabler in learning and innovation processes, encouraging creativity and self-expression among individuals. It will produce new knowledge on how individuals from different backgrounds interact, communicate and collaborate in virtual spaces to co-construct knowledge and to create innovations through emerging technologies.

Tools for Virtual Collaborative Innovation (VISCI TOOLS)

  • Duration: 1.1.2010-30.04.2012
  • Funding: Tekes
  • Project leaders: Petri Myllymäki and Patrik Floreen
  • Key words: Collaborative interaction and innovation, virtual teams, intelligent information technology, technology-enabled interaction, collaboration and co-creation of knowledge, interdisciplinary research approach
  • Abstract: In close collaboration with the user and service organizations, the multidisciplinary project will develop prototypes for novel ICT-enabled tools and the accompanying processes that increase the efficiency of virtual collaborative innovation. The project will iteratively specify, develop and validate the prototypes and their accompanying novel processes, proceeding through three parallel work packages: 1) Research and co-develop collaborative virtual innovation processes, and specify their special requirements for ICT-enabled collaborative tools and spaces; 2) Take these requirements to the co-development of virtual space prototypes; 3) Validate with the users the virtual space prototypes and the new processes that apply them ; The consortium consists of a global user company, and innovative service developer and tester organizations. This network will lead to discovery of new business opportunities and business models. All software that is developed during the project will be open source, and thus freely exploitable by the emerging software service industry. The management principles of the distributed innovation processes that apply virtual collaborative spaces will be published for wider industrial dissemination. The research teams have also started a parallel basic research project VISCI in the MOTIVE research program of the Academy of Finland, and this basic research multiplies the results of the VISCI Tools project.

The Helsinki Privacy Experiment

  • Duration: 1.1.2010-31.12.2012
  • Funding: Academy of Finland
  • Project leaders: Petri Myllymäki and Antti Oulasvirta
  • Key words: Ubicomp, surveillance, privacy, ethical practices
  • Abstract: More and more data is already collected on our everyday lives, yet we have little understanding of how that data is used; or, even worse, how it could be used. In many cases the data usage scenarios are personally intrusive and can have negative social consequences, particularly if they are incorrect or uncontrollable. The Helsinki Privacy Experiment (HPE) studies the issue of privacy in ubiquitous computing by playing the role of advocatos diaboli: HPE arranges experimental conditions in which the hard limits to the problem - both technological and human - can be properly charted, thus providing vital information of the importance and role of privacy in the information society of tomorrow.

Adaptive Interfaces for Consumer Applications (AICA)

  • Duration: 1.11.2009-31.03.2012
  • Funding: Tekes (via the Ubicom programme)
  • Project leaders: Petri Myllymäki and Patrik Floreen
  • Key words: Ubiquitous computing, context-awareness, personalization
  • Abstract: Context-awareness and personalisation are key enablers for the ubiquitous computing vision. From the end user's point of view, they are essential for managing the complexity of future services and applications. The goal of this project is to enable adaptive interfaces to information by combining context-awareness with personalisation and to perform user acceptance studies on a number of prototypes for ubicom consumer applications. We will continue working on our intelligent mobile shopping assistant Massive. We will also work on new interaction methods (e.g., UbiLight, a prototype of a system using hand movement as an interactive spatial interaction method). We aim at ideas that could be commercialised in a few years' time.

    The research groups involved in the project cover a large spectrum of scientific and technical expertise and the project partners include different types of stakeholders: a retail chain, mobile technology providers, a teleoperator and representatives of diverse user groups. Project partners offer for the project also their own technology and expertise in addition to the financial support they provide for the project. Our main pilot area is K-Citymarket Ruoholahti.

Applications of the MDL Principle to Prediction and Model Selection and Testing (MODEST)

  • Duration: 1.1.2009-31.12.2012
  • Funding: Academy of Finland
  • Project leader: Professor Petri Myllymäki
  • Key words: MDL, normalized maximum likelihood, universal models, variable order Markov chains, model selection, model testing
  • Abstract:

    The main application areas of statistical inference are model selection, hypothesis testing, and prediction. In model selection the goal typically is to increase our understanding of a problem area, by utilization of data analysis, data mining, and information extraction tools, and in hypothesis testing we estimate the validity of a certain hypothesis about the problem. In prediction, the task, of course, is to estimate the probability of some unknown quantity, which typically is temporally located in the future. To perform these statistical inference/machine learning tasks we need a theoretically solid framework, which is logically correct satisfying certain reasonable optimality criteria, while at the same time providing computationally feasible methods for practical applications. Information theory offers an excellent foundation for such a framework.

    Our earlier pioneering work on information-theoretic statistical inference, in which the Minimum Description Length (MDL) principle plays a central role, has recently spurred an influx of new ideas, problems, and extensions. We believe that the new ideas lead to theoretically and practically significant advances in MDL-based modeling. The goal of the project is to study these issues further, focusing on four research areas: sequentially normalized universal models, optimally distinguishable models, extensions of the structure function, and non-stationary modeling. In addition to theoretical advances in these areas, we will develop new algorithms suitable for practical model selection, testing, and prediction tasks, and empirically demonstrate their validity using both artificial and real-world data sets from various domains.

  • Abstract in Finnish:

    Tilastollisen päättelyn tärkeimmät sovellusalueet ovat mallinvalinta, hypoteesin testaus, ja ennustaminen. Mallinvalinnassa päämääränä on lisätä ymmärrystämme ongelmakentästä analysoimalla tai "louhimalla" dataa, tai käyttäen muita keinoja relevantin informaation suodattamiseksi. Hypoteesin testauksessa taas arvioidaan tietyn hypoteesin paikkansapitävyyttä. Ennustamisessa on tietenkin kyse annetun tuntemattoman suureen, joka yleensä liittyy ajallisesti tulevaisuudessa sijaitsevaan tapahtumaan, todennäköisyyden arvioimisesta. Näiden tilastollisen päättelyn/koneoppimisen ongelmien ratkaisemiseksi tarvitaan teoreettisesti luotettava kehikko, joka on loogisesti konsistentti ja täyttää tietyt rationaaliset optimaalisuusehdot, tarjoten samalla laskennallisesti tehokkaita menetelmiä käytännön sovelluksiin. Informaatioteoria tarjoaa oivan perustan tällaiselle teoreettiselle kehikolle.

    Aikaisempi uraauurtava työmme informaatioteoreettisen tilastollisen päättelyn parissa, missä MDL-periaaate näyttelee keskeistä roolia, on viime aikoina synnyttänyt varsinaisen uusien ideoiden, ongelmien ja laajennusten hyökyaallon. Uskomme näiden uusien ideoiden johtavan merkittäviin teoreettisiin ja käytännöllisiin edistysaskeleisiin MDL-periaatteeseen perustuvassa mallinnuksessa. Projektin tarkoitus on tutkia tarkemmin näitä uusia ideoita, keskittyen seuraaville neljälle tutkimusalueelle: sekventiaalisesti normalisoidut universaalit mallit, mallien optimaalinen erottelu, struktuurifukntion laajennukset, ja ei-stationaarinen mallinnus. Kehitämme näillä alueilla sekä uusia teoreettisia tuloksia että käytännöllisiä algoritmeja, joita voidaan käyttää mallinvalinnassa, hypoteesin testauksessa ja ennustamisessa. Kehitettyjen algoritmien soveltuvuus näihin tehtäviin osoitetaan empiirisillä kokeilla joissa käytetään sekä keinotekoisia että luonnollisia datajoukkoja monilta eri ongelma-alueilta.

Algorithmic Methods in Stemmatology (STAM)

  • Duration: 1.1.2009-31.12.2011
  • Funding: University of Helsinki Research Funds
  • Project leader: Dr. Teemu Roos
  • Key words: stemmatology, textual criticism, phylogenetics
  • Abstract:

    Given a collection of imperfect copies of a textual document, the aim of stemmatology is to reconstruct the history of the text, indicating for each variant the source text from which it was copied. The project develops theory and methods for computer-assisted stemmatology, and evaluates the accuracy of such methods in simulated and real data-sets.

    Stemmatology lies at the intersection of several scientific disciplines. On one hand, it is associated with humanities which are largely based on using texts as sources, and on the other hand, to mathematics, statistics, and computer science, and finally, to evolutionary biology and cladistics, the study evolution and speciation. The aim of traditional stemmatology — or textual criticism — has been to infer the original content of a textual source based on a number of different versions. Modern computer-assisted stemmatology has proven to be an extremely powerful tool not only for the study of the alteration of texts but in giving insight to the way the texts have been distributed geographically as well. In doing so, stemmatology is answering several central questions in historical, philological, and theological research.

    Our objective is to develop reliable methods and tools for the study of the origins, variation, and distribution of texts. An easy-to-use method available on the internet, based on a sound methodology, would significantly benefit a large group of scholars in a variety humanistic disciplines. In computer science applications include, e.g., the study of computer viruses and chain letters. Advances in methods for textual scholarship also contribute to cladistics and evolutionary biology.

  • Abstract in Finnish:

    Stemmatologia on tieteiden kentässä usean eri tieteenalan risteyskohdassa. Se liittyy yhtäältä lähteinään tekstejä käyttäviin humanistisiin tieteisiin, toisaalta matematiikkaan, tilastotieteeseen ja tietojenkäsittelytieteeseen sekä kolmanneksi evoluutiobiologian alan eläinten lajiutumisjärjestystä tutkivaan kladistiikkaan. Perinteisesti stemmatologian — tai vanhemmin termein tekstikritiikin — päämääränä on pidetty jonkin kirjallisen lähteen alkuperäisen tekstisisällön selvittämistä laajasta joukosta erilaisia versioita. Modernin tietokoneavusteisen stemmatologian on kuitenkin havaittu olevan erittäin tehokas apuväline myös tekstien kehitys- ja leviämishistorian tutkimuksessa ja pystyvän siten vastaamaan useisiin aivan keskeisiin kysymyksiin historiantutkimuksen, filologian ja teologian alalla.

    Päämäärämme on kehittää aiempaa luotettavampia metodeja ja kehittää niiden pohjalta käytännön apuvälineitä tekstien sisällön, synnyn, kehityshistorian ja leviämisen tutkimukseen. Luotettava stemmatologinen metodi ja työryhmämme kehittämä helppokäyttöinen internetissä saatavilla oleva apuväline auttaisi merkittävällä tavalla laajaa eri humanististen tieteiden tutkijoiden joukkoa. Tietojenkäsittelytieteen alalla metodin käytännöllisiä sovellusaloja ovat mm. tietokonevirusten ja ketjukirjeiden tutkimus. Tekstitutkimuksen tarpeisiin laaditun metodin kehittäminen kontribuoi myös suoraan perimän muutoksiin perehtyvän kladistiikan metodiikkaan.


EU Network of Excellence in Pattern Analysis, Statistical Modelling and Computational Learning (PASCAL)

  • Duration: 01.12.2003-29.02.2013
  • Funding: EU
  • Project leader: Professor Petri Myllymäki
  • Key words: machine learning, pattern analysis, statistical modelling
  • Abstract:

    The objective is to build a Europe-wide Distributed Institute which will pioneer principled methods of pattern analysis, statistical modelling and computational learning as core enabling technologies for multimodal interfaces that are capable of natural and seamless interaction with and among individual human users.

    At each stage in the process, machine learning has a crucial role to play. It is proving an increasingly important tool in Machine Vision, Speech, Haptics, Brain Computer Interfaces, Information Extraction and Natural Language Processing; it provides a uniform methodology for multimodal integration; it is an invaluable tool in information extraction; while on-line learning provides the techniques needed for adaptively modelling the requirements of individual users. Though machine learning has such potential to improve the quality of multimodal interfaces, significant advances are needed, in both the fundamental techniques and their tailoring to the various aspects of the applications, before this vision can become a reality.

    The institute will foster interaction between groups working on fundamental analysis including statisticians and learning theorists; algorithms groups including members of the non-linear programming community; and groups in machine vision, speech, haptics, brain-computer interfaces, natural language processing, information-retrieval, textual information processing and user modelling for computer human interaction, groups that will act as bridges to the application domains and end-users.

  • Abstract in Finnish:

    Pascal on EU:n rahoittama tutkimusverkosto (Network of Excellence), johon kuuluu 57 eurooppalaista tutkimuslaitosta. Helsingin yliopiston tietojenkäsittelytieteen laitos on yksi verkoston kolmestatoista ydinsolmusta (core sites) ja Helsingin yliopiston edustajalla Petri Myllymäellä on paikka verkoston johtoryhmässä. Verkoston toimintaan osallistuu aktiivisesti myös lukuisia CoSCo-ryhmän ulkopuolisia tietojenkäsittelytieteen laitoksen tutkijoita ja jatko-opiskelijoita.

    Verkoston perusajatuksena on koota yhteen tilastollisen mallinnuksen ja koneoppimisen huippuosaajat Euroopassa. Verkoston kotisivulla ( päämäärä muotoillaan seuraavasti: Tavoitteena on synnyttää Euroopan laajuinen hajautettu tutkimuslaitos, joka kehittää periaatteellisia hahmoanalyysin, tilastollisen mallintamisen ja laskennallisen oppimisen menetelmiä, jotka mahdollistavat multimodaalisten, luonnolliseen ja saumattomaan vuorovaikutukseen kykenevien käyttöliittymien kehittämisen.

    Koneoppimisella on ratkaiseva rooli prosessin kaikissa vaiheissa. Se on osoittautunut tärkeäksi työkaluksi konenäössä, puheentunnistuksessa, haptiikassa, aivokäyttöliittymissä, tiedon eristämisessä ja luonnollisen kielen käsittelyssä. Se tarjoaa yhdenmukaisen metodologian multimodaaliselle integraatiolle. Se on korvaamaton työkalu tiedon eristämisessä, ja online-oppiminen tarjoaa tekniikan jota tarvitaan yksittäisten käyttäjien vaatimusten adaptiivisessa mallintamisessa. Huolimatta koneoppimisen potentiaalista multimodaalisten käyttöliittymien laadun kehittämisessä, visioiden toteuttamiseksi tarvitaan vielä merkittäviä edistysaskeleita sekä perustekniikoissa että niiden sovittamisessa sovellusten lukuisiin vaatimuksiin.

    Tutkimuslaitos tulee edistämään vuorovaikutusta sellaisten perusanalyysin parissa työskentelevien ryhmien välillä kuten tilastotieteilijät ja oppimisteoreetikot, algoritmitutkijat erityisesti epälineaarisen ohjelmoinnin alueelta, tutkimusryhmät konenäön, puheentunnistuksen, haptiikan, aivokäyttöliittymien, luonnollisen kielen käsittelyn, tiedonhaun, tekstitiedonkäsittelyn ja käyttäjämallinnuksen alueilta sekä ryhmät jotka toimivat välittäjinä sovellusalueisiin ja loppukäyttäjiin.

Earlier CoSCo Projects

Last updated on 22 Jan 2014 by Petri Myllymäki - Page created on 18 Sep 2012 by Petri Myllymäki