Machine Learning Coffee Seminar – Dorota Glowacka – Generating Captions for Search Results with Lexical and Semantic Features
17.6.2019 @ 09:00 - 10:00
In exploratory search, users try to acquire new knowledge or investigate novel data sets. Exploratory search is challenging: users’ lack of domain knowledge can negatively impact their ability to assess the quality of search results. Indeed, as search proceeds, the accumulation of feedback can result in the presented documents no longer being related to the initial search query. Users’ information needs are assumed to be highly dynamic and therefore expected to evolve over time, but this is difficult to distinguish from query drift, where improper feedback has adversely affected results. Unfortunately, in either scenario, users engaged in exploratory search can be unaware of their changing search trajectory.
With these issues in mind, we present Exploratory Search Captions (ESC) inspired by research in image caption generation. ESC is an ensemble method that generates captions by combining semantic and lexical features. Semantic information is derived from a novel sequence-to-sequence autoencoder that performs representation learning on ranked search results. ESC aims to provide a succinct description of ranked search results in order to assure users that their search is proceeding as intended and to alert them when it is not. In the context of scientific literature search, ESC generated 26% more relevant captions than methods from the query expansion literature according to an expert evaluator. All other methods were comparatively noisy, including many more captions that were too generic to be useful.
Speaker: Dorota Glowacka
Affiliation: Professor of Computer Science, University of Helsinki