How machine translation can help during the Ukraine crisis

The invasion of Ukraine has activated support efforts around the world. At the University of Helsinki, professor of language technology Jörg Tiedemann realized the acute need for Ukrainian language support. “In the ongoing crisis, initiatives to develop language processing tools to help refugees are hugely important,” he says. As coordinator of the special interest group on language, speech and cognition at FCAI, Tiedemann is especially focused on making sure that artificial intelligence helps humans communicate not only with machines, but also with each other. 

Some of these initiatives are part of the European Language Grid, a consortium of scientists and R&D tool developers who also curate a repository of resources for Ukrainian. Tiedemann has gathered a list of Ukrainian language tools, including a chatbot, online translation sites and some of his research group’s own work like OpusMT, an open-source machine translation framework. Ready-to-use models can be downloaded and used in any application or domain. An example is Translate Locally, where you can translate directly within your browser and run translation models like OpusMT. “This can be done offline and doesn’t send data to the cloud or any other online service. Everything stays on your local machine, so it’s very good for your privacy,” says Tiedemann. 

Despite the ubiquity and efficiency of these tools, there are still obstacles to effective machine translation or speech recognition. “Good user interfaces are needed. This is crucial, especially for end-users who expect services to be immediately accessible on common devices like mobile phones. And we always need more language data.” The machine learning models can only be as good as the data they learn from, which come from various sources like movie subtitles, legislative texts or other publicly available collections. “But this may not cover specific questions coming from refugees or issues related to health, the news or getting support. We need more domain-specific data,” explains Tiedemann. And to evaluate the quality of translation, native speakers are needed, especially those who understand both languages. Language pairs that don’t involve English are particularly important. Models for languages with fewer speakers or less resources tend to have worse translation quality because of the limited training data. This map shows current gaps.

While Tiedemann hopes that language technology can play an immediate part in supporting those in need during emergency situations, he is also looking forward to holding a research event in Helsinki on April 29. Delayed two years due to COVID-19, UnGroundNLP-2022 will gather scientists in the fields of natural language processing, cognitive science, speech technology and general machine learning. The workshop is supported by the Finnish Center for Artificial Intelligence FCAI. 

 

Read more about how the University of Helsinki and Aalto University are supporting students and researchers from Ukraine.