Make it Digital!

Lecturer : 
Event type: 
Event
Doctoral dissertation
Respondent: 
Opponent: 
Custos: 
Event time: 
2016-08-18 11:45 to 18:00
Place: 
Aalto University, Otakaari 1M, Espoo
Description: 

Come and hear the latest news about the digitalisation and its untapped opportunities from leading academic and industrial experts. Explore the exhibition and meet StartUps, corporations and organisations on the field of digitalisation. Come and network.

Please register no later than 8 August 2016. The event is free of charge.

Welcome!

See more detailed programme ja registration on event page.

HIIT Kumpula Seminar: SCOT modeling, parallel training and statistical inference

Lecturer : 
Mikhail Malyutov
Event type: 
HIIT seminar
Doctoral dissertation
Respondent: 
Opponent: 
Custos: 
Event time: 
2016-05-27 10:15 to 11:00
Place: 
Exactum B119
Description: 

Abstract:

Stochastic COntext Tree (abbreviated as SCOT) is m-Markov Chain with every state of a string independent of the symbols in its more remote past than the context of length determined by the preceding symbols of this state. SCOT has also appeared in other fields under various names (VLMC, PST, CTW) for compression applications. Consistency of algorithms for training SCOT have been derived for stationary time series with mixing.
We survey recent advances in SCOT modeling, parallel training and statistical inference described in chapter 3 of B. Ryabko, J. Astola and M. Malyutov `Compression-Based Methods of Statistical Analysis and Prediction of Time Series', Springer, which is to appear shortly.

Bio:

Mikhail Malyutov, Professor of Applied Statistics, Northeastern University, Boston. On  his sabbatical he presented talks in many UK and Australian Universities. Before 1995, he was with Kolmogorov Statistical Lab, Moscow.

 

The First Europe-China Workshop on Big Data Management

Event type: 
Workshop
Event time: 
2016-05-16 09:00 to 16:30
Place: 
B222, Exactum, Kumpula Campus
Description: 

Big Data has become ubiquitous in modern society. It challenges state-of-the-art data acquisition, computation and analysis methods. 

This workshop aims to gather experts in big data management to exchange views on cutting-edge data management problems. Further, the workshop aims to create opportunities for strengthening existing collaborations and for establishing new collaborations. Thus, the workshop will allow the attendees to build relations with Chinese and European researchers for future potential grant applications at Horizon 2020, Chinese NSFC, etc.

No registration is needed and welcome to join us! 

The list of speakers and program can be found here

Word Associations as a Language Model for Generative and Creative Tasks

Event type: 
Doctoral dissertation
Doctoral dissertation
Respondent: 
Oskar Gross
Opponent: 
Professor Timo Honkela (University of Helsinki)
Custos: 
Professor Hannu Toivonen (University of Helsinki)
Event time: 
2016-05-06 12:00 to 14:00
Place: 
University of Helsinkin Main Building, Auditorium XIV (Unioninkatu 34, 3rd floor)
Description: 

M.Sc. Oskar Gross will defend his doctoral thesis Word Associations as a Language Model for Generative and Creative Tasks on Friday the 6th of May 2016 at 12 o'clock in the University of Helsinkin Main Building, Auditorium XIV (Unioninkatu 34, 3rd floor). His opponent is Professor Timo Honkela (University of Helsinki) and custor Professor Hannu Toivonen (University of Helsinki). The defence will be held in English.

Word Associations as a Language Model for Generative and Creative Tasks

In order to analyse natural language and gain a better understanding of documents, a common approach is to produce a language model which creates a structured representation of language which could then be used further for analysis or generation. This thesis will focus on a fairly simple language model which looks at word associations which appear together in the same sentence. We will revisit a classic idea of analysing word co-occurrences statistically and propose a simple parameter-free method for extracting common word associations, i.e. associations between words that are often used in the same context (e.g., Batman and Robin). Additionally we propose a method for extracting associations which are specific to a document or a set of documents. The idea behind the method is to take into account the common word associations and highlight such word associations which co-occur in the document unexpectedly often.

We will empirically show that these models can be used in practice at least for three tasks: generation of creative combinations of related words, document summarization, and creating poetry.

First the common word association language model is used for solving tests of creativity -- the Remote Associates test. Then observations of the properties of the model are used further to generate creative combinations of words -- sets of words which are mutually not related, but do share a common related concept.

Document summarization is a task where a system has to produce a short summary of the text with a limited number of words. In this thesis, we will propose a method which will utilise the document-specific associations and basic graph algorithms to produce summaries which give competetive performance on various languages. Also, the document-specific associations are used in order to produce poetry which is related to a certain document or a set of documents. The idea is to use documents as inspiration for generating poems which could potentially be used as commentary to news stories.

Empirical results indicate that both, the common and the document-specific associations, can be used effectively for different applications. This provides us with a simple language model which could be used for different languages.

Availability of the dissertation

An electronic version of the doctoral dissertation is available on the e-thesis site of the University of Helsinki.

 

Printed copies will be available on request from Oskar Gross: oskar.gross@gmail.com.

Cover Song Identification Using Compression-based Distance Measures

Lecturer : 
Event type: 
Doctoral dissertation
Doctoral dissertation
Respondent: 
Teppo E. Ahonen
Opponent: 
Professor Petri Toiviainen (University of Jyväskylä)
Custos: 
Professor Esko Ukkonen (University of Helsinki)
Event time: 
2016-04-01 12:00 to 14:00
Place: 
University of Helsinki Exactum Building, Auditorium CK112 (Gustaf Hällströmin katu 2b)
Description: 

Summary:

Measuring similarity in music data is a problem with various potential applications. In recent years, the task known as cover song identification has gained widespread attention. In cover song identification, the purpose is to determine whether a piece of music is a different rendition of a previous version of the composition. The task is quite trivial for a human listener, but highly challenging for a computer.

This research approaches the problem from an information theoretic starting point. Assuming that cover versions share musical information with the original performance, we strive to measure the degree of this common information as the amount of computational resources needed to turn one version into another. Using a similarity measure known as normalized compression distance, we approximate the non-computable Kolmogorov complexity as the length of an object when compressed using a real-world data compression algorithm. If two pieces of music share musical information, we should be able to compress one using a model learned from the other.

In order to use compression-based similarity measuring, the meaningful musical information needs to be extracted from the raw audio signal data. The most commonly used representation for this task is known as chromagram: a sequence of real-valued vectors describing the temporal tonal content of the piece of music. Measuring the similarity between two chromagrams effectively with a data compression algorithm requires further processing to extract relevant features and find a more suitable discrete representation for them. Here, the challenge is to process the data without losing the distinguishing characteristics of the music.

In this research, we study the difficult nature of cover song identification and search for an effective compression-based system for the task. Harmonic and melodic features, different representations for them, commonly used data compression algorithms, and several other variables of the problem are addressed thoroughly. The research seeks to shed light on how different choices in the scheme attribute to the performance of the system. Additional attention is paid to combining different features, with several combination strategies studied. Extensive empirical evaluation of the identification system has been performed, using large sets of real-world music data.

Evaluations show that the compression-based similarity measuring performs relatively well but fails to achieve the accuracy of the existing solution that measures similarity by using common subsequences. The best compression-based results are obtained by a combination of distances based on two harmonic representations obtained from chromagrams using hidden Markov model chord estimation, and an octave-folded version of the extracted salient melody representation. The most distinct reason for the shortcoming of the compression performance is the scarce amount of data available for a single piece of music. This was partially overcome by internal data duplication. As a whole, the process is solid and provides a practical foundation for an information theoretic approach for cover song identification.

Pages