Probabilistic Models for Alignment of Etymological Data

Lecturer : 
Roman Yangarber
Event type: 
HIIT seminar
Event time: 
2011-04-08 10:15 to 11:00
Kumpula Exactum C222
Talk announcement:
HIIT Seminar Kumpula, Friday April 8, 10:15, Exactum C222

Roman Yangarber
University of Helsinki

Probabilistic Models for Alignment of Etymological Data 

Etymology is the study of origins of words and
relationships and connections among languages.  It involves
many sub-problems, including finding cognates or sets of
genetically related words across a language family,
discovering rules of regular sound correspondence among
the languages, building phylogenetic trees, and
reconstructing hidden data, including proto-languages.  We
focus mainly on the regularity of sound correspondence,
but address some of the others as well.  Our models try to
align etymological data, or find the best alignment at the
sound level, given a set of etymological data.  We aim to
devise methods that are as objective as possible, making
no a priori assumptions---e.g., no preference for
vowel-vowel or consonant-consonant alignments.  One of the
goals is to measure the quality of the data sets, in terms
of their internal consistency.  We introduce a MDL-based
initial model and present several extensions.  We also
discuss several ways for evaluating the results,
qualitatively and quantitatively.  The models are
evaluated on data from the Uralic family (which includes
Finnish, Estonian and Hungarian, among other languages).

(Work done under Academy Project Uralink.
Joint work with Hannes Wettig.)

--Matti Järvisalo

Apr  8: Roman Yangarber
Apr 15: * Internal HIIT event* 
Apr 22: * No seminar -- Good Friday *
Apr 29: Antti Oulasvirta / Teemu Roos
May  6: Jesper Nederlof
May 13: Haijun Zhou
May 20: *** free ***
May 27: *** free ***

Last updated on 4 Apr 2011 by Matti Järvisalo - Page created on 4 Apr 2011 by Matti Järvisalo