Maria Afara

2022

pdf abs
Investigating automatic and manual filtering methods to produce MT-ready glossaries from existing ones
Maria Afara | Randy Scansani | Loïc Dugast
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

Commercial Machine Translation (MT) providers offer functionalities that allow users to leverage bilingual glossaries. This poses the question of how to turn glossaries that were intended to be used by a human translator into MT-ready ones, removing entries that could harm the MT output. We present two automatic filtering approaches - one based on rules and the second one relying on a translation memory - and a manual filtering procedure carried out by a linguist. The resulting glossaries are added to the MT model. The outputs are compared against a baseline where no glossary is used and an output produced using the original glossary. The present work aims at investigating if any of these filtering methods can bring a higher terminology accuracy without negative effects on the overall quality. Results are measured with terminology accuracy and Translation Edit Rate. We test our filters on two language pairs, En-Fr and De-En. Results show that some of the automatically filtered glossaries improve the output when compared to the baseline, and they may help reach a better balance between accuracy and overall quality, replacing the costly manual process without quality loss.

Co-authors

Randy Scansani 1
Loic Dugast 1

Venues

eamt1