Mining Subsequences with Surprising Event Counts

Lecturer : 
Jefrey Lijffijt, Aalto University
Event type: 
HIIT seminar
Event time: 
2014-01-27 13:15 to 14:00
Aalto University, Computer Science Building, lecture hall T2

Title: Mining Subsequences with Surprising Event Counts

We consider the problem of mining subsequences with surprising event counts, which can be used, for example, to find parts of a text where a word is surprisingly frequent. We introduce a method to find all subsequences of a long data sequence of a fixed length where the count of an event is significantly different from what is expected. In estimating what is expected, we have to take into account that we consider many subsequences concurrently. Existing methods for taking this into account are either computationally very demanding, or they do not account for any dependency structure.
We try to account for the dependency structure directly, by analysing the joint distribution of the patterns, which turns out to be difficult, and we introduce a simple and efficiently computable upper-bound that can be used instead. We provide empirical evidence that the upper-bound is more powerful than existing alternatives, and we demonstrate the utility of the method in experiments on two types of data, text and DNA.

About the speaker:

Last updated on 21 Jan 2014 by Antti Ukkonen - Page created on 21 Jan 2014 by Antti Ukkonen