Mariia Anisimova
2026
Challenges in Machine Translation of Interactive Multimodal Exercises
Lucie Polakova | Miroslav Hrabal | Věra Kloudová | Michal Novák | Mariia Anisimova | Martin Popel
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Lucie Polakova | Miroslav Hrabal | Věra Kloudová | Michal Novák | Mariia Anisimova | Martin Popel
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
This paper describes linguistic and technological challenges encountered within an applied project aimed at expanding a large e-learning portal from its original Czech to three other languages: Ukrainian, English and German. Although there seems to be a general belief that machine translation is a solved task in 2026, we show that translating educational content, which in our case is highly terminological, multimodal, interactive and encoded in XML, brings along many challenges of different types, some easily solvable and some not. We also compare our results from the early phase of the project (Transformer-based machine translation) with those after the switch to the LLM-based translation methods. We show that both MT methods are prone to different types of errors, some of which are quite new (such as the undesired correction of counterfactual statements) and require new ways of handling them. The resulting four-language edition of the educational web portal will be freely available to educators, students and researchers by the end of 2026.
2024
Charles Translator: A Machine Translation System between Ukrainian and Czech
Martin Popel | Lucie Polakova | Michal Novák | Jindřich Helcl | Jindřich Libovický | Pavel Straňák | Tomas Krabac | Jaroslava Hlavacova | Mariia Anisimova | Tereza Chlanova
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Martin Popel | Lucie Polakova | Michal Novák | Jindřich Helcl | Jindřich Libovický | Pavel Straňák | Tomas Krabac | Jaroslava Hlavacova | Mariia Anisimova | Tereza Chlanova
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
We present Charles Translator, a machine translation system between Ukrainian and Czech, developed as part of a society-wide effort to mitigate the impact of the Russian-Ukrainian war on individuals and society. The system was developed in the spring of 2022 with the help of many language data providers in order to quickly meet the demand for such a service, which was not available at the time in the required quality. The translator was later implemented as an online web interface and as an Android app with speech input, both featuring Cyrillic-Latin script transliteration. The system translates directly, in comparison to other available systems that use English as a pivot, and thus makes advantage of the typological similarity of the two languages. It uses the block back-translation method which allows for efficient use of monolingual training data. The paper describes the development process including data collection and implementation, evaluation, mentions several use cases and outlines possibilities for further development of the system for educational purposes.
Attitudes in Diplomatic Speeches: Introducing the CoDipA UNSC 1.0
Mariia Anisimova | Šárka Zikánová
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024
Mariia Anisimova | Šárka Zikánová
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024
This paper presents CoDipA UNSC 1.0, a Corpus of Diplomatic Attitudes of the United Nations Security Council annotated with the attitude-part of the Appraisal theory. The speeches were manually selected according to topic-related and temporal criteria. The texts were then annotated according to the predefined annotation scenario. The distinguishing features of the diplomatic texts require a modified approach to attitude evaluation, which was implemented and presented in the current work. The corpus analysis has proven diplomatic speeches to be consistently evaluative, offered an overview of the most prominent means of expressing subjectivity in the corpus, and provided the results of the inter-annotator agreement evaluation.