Maarit Koponen


2024

pdf
Effects of different types of noise in user-generated reviews on human and machine translations including ChatGPT
Maja Popovic | Ekaterina Lapshinova-Koltunski | Maarit Koponen
Proceedings of the Ninth Workshop on Noisy and User-generated Text (W-NUT 2024)

This paper investigates effects of noisy source texts (containing spelling and grammar errors, informal words or expressions, etc.) on human and machine translations, namely whether the noisy phenomena are kept in the translations, corrected, or caused errors. The analysed data consists of English user reviews of Amazon products translated into Croatian, Russian and Finnish by professional translators, translation students, machine translation (MT) systems, and ChatGPT language model. The results show that overall, ChatGPT and professional translators mostly correct/standardise those parts, while students are often keeping them. Furthermore, MT systems are most prone to errors while ChatGPT is more robust, but notably less robust than human translators. Finally, some of the phenomena are particularly challenging both for MT systems and for ChatGPT, especially spelling errors and informal constructions.

2023

pdf bib
Proceedings of the 24th Annual Conference of the European Association for Machine Translation
Mary Nurminen | Judith Brenner | Maarit Koponen | Sirkku Latomaa | Mikhail Mikhailov | Frederike Schierl | Tharindu Ranasinghe | Eva Vanmassenhove | Sergi Alvarez Vidal | Nora Aranberri | Mara Nunziatini | Carla Parra Escartín | Mikel Forcada | Maja Popovic | Carolina Scarton | Helena Moniz
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

pdf
Do Humans Translate like Machines? Students’ Conceptualisations of Human and Machine Translation
Salmi Leena | Aletta G. Dorst | Maarit Koponen | Katinka Zeven
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

This paper explores how students conceptualise the processes involved in human and machine translation, and how they describe the similarities and differences between them. The paper presents the results of a survey involving university students (B.A. and M.A.) taking a course on translation who filled out an online questionnaire distributed in Finnish, Dutch and English. Our study finds that students often describe both human translation and machine translation in similar terms, suggesting they do not sufficiently distinguish between them and do not fully understand how machine translation works. The current study suggests that training in Machine Translation Literacy may need to focus more on the conceptualisations involved and how conceptual and vernacular misconceptions may affect how translators understand human and machine translation.

pdf
Computational analysis of different translations: by professionals, students and machines
Maja Popovic | Ekaterina Lapshinova-Koltunski | Maarit Koponen
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

In this work, we analyse different translated texts in terms of various text features. We compare two types of human translations, professional and students’, and machine translation outputs in terms of lexical and grammatical variety, sentence length,as well as frequencies of different POS tags and POS-trigrams. Our experimentsare carried out on parallel translations into three languages, Croatian, Finnish andRussian, all originating from the same source English texts. Our results indicatethat machine translations are closest to the source text, followed by student translations. Also, student translations are similar both to professional as well as to MT, sometimes even more to MT. Furthermore, we identify sets of features which are convenient for distinguishing machine from human translations.

pdf
DECA: Democratic epistemic capacities in the age of algorithms
Maarit Koponen | Mary Nurminen | Nina Havumetsä | Juha Lång
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

The DECA project consortium investigates epistemic capacities, defined as an individual’s access to reliable knowledge, their ability to participate in knowledge production, and society’s capacity to make informed, sustainable policy decisions. In this paper, we focus specifically on the parts of the project examining the challenges posed by multilinguality in these processes and the potential role of MT in supporting access to, and production of, knowledge.

2022

pdf
DiHuTra: a Parallel Corpus to Analyse Differences between Human Translations
Ekaterina Lapshinova-Koltunski | Maja Popović | Maarit Koponen
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper describes a new corpus of human translations which contains both professional and students translations. The data consists of English sources – texts from news and reviews – and their translations into Russian and Croatian, as well as of the subcorpus containing translations of the review texts into Finnish. All target languages represent mid-resourced and less or mid-investigated ones. The corpus will be valuable for studying variation in translation as it allows a direct comparison between human translations of the same source texts. The corpus will also be a valuable resource for evaluating machine translation systems. We believe that this resource will facilitate understanding and improvement of the quality issues in both human and machine translation. In the paper, we describe how the data was collected, provide information on translator groups and summarise the differences between the human translations at hand based on our preliminary results with shallow features.

pdf bib
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
Helena Moniz | Lieve Macken | Andrew Rufener | Loïc Barrault | Marta R. Costa-jussà | Christophe Declercq | Maarit Koponen | Ellie Kemp | Spyridon Pilos | Mikel L. Forcada | Carolina Scarton | Joachim Van den Bogaert | Joke Daems | Arda Tezcan | Bram Vanroy | Margot Fonteyne
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

pdf
DiHuTra: a Parallel Corpus to Analyse Differences between Human Translations
Ekaterina Lapshinova-Koltunski | Maja Popović | Maarit Koponen
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

This project aimed to design a corpus of parallel human translations (HTs) of the same source texts by professionals and students. The resulting corpus consists of English news and reviews source texts, their translations into Russian and Croatian, and translations of the reviews into Finnish. The corpus will be valuable for both studying variation in translation and evaluating machine translation (MT) systems.

pdf
LITHME: Language in the Human-Machine Era
Maarit Koponen | Kais Allkivi-Metsoja | Antonio Pareja-Lora | Dave Sayers | Márta Seresi
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

The LITHME COST Action brings together researchers from various fields of study focusing on language and technology. We present the overall goals of LITHME and the network’s working groups focusing on diverse questions related to language and technology. As an example of the work of the LITHME network, we discuss the working group on language work and language professionals.

2020

pdf
MT for subtitling: User evaluation of post-editing productivity
Maarit Koponen | Umut Sulubacak | Kaisa Vitikainen | Jörg Tiedemann
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This paper presents a user evaluation of machine translation and post-editing for TV subtitles. Based on a process study where 12 professional subtitlers translated and post-edited subtitles, we compare effort in terms of task time and number of keystrokes. We also discuss examples of specific subtitling features like condensation, and how these features may have affected the post-editing results. In addition to overall MT quality, segmentation and timing of the subtitles are found to be important issues to be addressed in future work.

pdf
MT for Subtitling: Investigating professional translators’ user experience and feedback
Maarit Koponen | Umut Sulubacak | Kaisa Vitikainen | Jörg Tiedemann
Proceedings of 1st Workshop on Post-Editing in Modern-Day Translation

2018

pdf
The WMT’18 Morpheval test suites for English-Czech, English-German, English-Finnish and Turkish-English
Franck Burlot | Yves Scherrer | Vinit Ravishankar | Ondřej Bojar | Stig-Arne Grönroos | Maarit Koponen | Tommi Nieminen | François Yvon
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

Progress in the quality of machine translation output calls for new automatic evaluation procedures and metrics. In this paper, we extend the Morpheval protocol introduced by Burlot and Yvon (2017) for the English-to-Czech and English-to-Latvian translation directions to three additional language pairs, and report its use to analyze the results of WMT 2018’s participants for these language pairs. Considering additional, typologically varied source and target languages also enables us to draw some generalizations regarding this morphology-oriented evaluation procedure.

2015

pdf bib
How to teach machine translation post-editing? Experiences from a post-editing course
Maarit Koponen
Proceedings of the 4th Workshop on Post-editing Technology and Practice

2013

pdf bib
This translation is not too bad: an analysis of post-editor choices in a machine-translation post-editing task
Maarit Koponen
Proceedings of the 2nd Workshop on Post-editing Technology and Practice

2012

pdf bib
Post-editing time as a measure of cognitive effort
Maarit Koponen | Wilker Aziz | Luciana Ramos | Lucia Specia
Workshop on Post-Editing Technology and Practice

Post-editing machine translations has been attracting increasing attention both as a common practice within the translation industry and as a way to evaluate Machine Translation (MT) quality via edit distance metrics between the MT and its post-edited version. Commonly used metrics such as HTER are limited in that they cannot fully capture the effort required for post-editing. Particularly, the cognitive effort required may vary for different types of errors and may also depend on the context. We suggest post-editing time as a way to assess some of the cognitive effort involved in post-editing. This paper presents two experiments investigating the connection between post-editing time and cognitive effort. First, we examine whether sentences with long and short post-editing times involve edits of different levels of difficulty. Second, we study the variability in post-editing time and other statistics among editors.

pdf
Comparing human perceptions of post-editing effort with post-editing operations
Maarit Koponen
Proceedings of the Seventh Workshop on Statistical Machine Translation