Loading Events
This event has passed.

Mark Granroth-Wilding: Concept LDA: LDA-like Topic Modelling of Text Using Continuous Word Embeddings

Monday, November 23, 2020, 9:00 AM  10:00 AM

Abstract:  Previous topic models, stemming from Latent Dirichlet Allocation (LDA), have typically been formulated as generative models of discrete words. I present a new Bayesian topic model, Concept LDA, that exploits word embeddings to generalize beyond its training data and to be robust to words unseen at training time. It differs from previous word embedding-based topic models in that its topics are multimodal distributions over the embedding space, bringing it closer to traditional LDA in the type of topic it captures. The result is a model that is more directly applicable as a robust substitute for LDA in the many contexts where LDA has previously been adopted. Our experiments show that the model can learn more coherent topics than LDA, appearing to capture a type of topic qualitatively closer to LDA than Gaussian LDA, whilst benefiting from using word embeddings. We find that Concept LDA learns topics that serve as useful features for an extrinsic classification task, outperforming both LDA and Gaussian LDA. We suggest that Concept LDA provides a more suitable replacement for LDA in applications where LDA has previously been used than other embedding-based topic models.

Speaker:  Mark Granroth-Wilding

Affiliation: Department of Computer Science, Helsinki University

Place of Seminar:  Zoom (MID: 692 0580 5544 Passcode: 962158)