Maud Bénard

2024

The improvements in neural machine translation make translation and post-editing pipelines ever more effective for a wider range of applications. In this paper, we evaluate the effectiveness of such a pipeline for the translation of scientific documents (limited here to article abstracts). Using a dedicated interface, we collect, then analyse the post-edits of approximately 350 abstracts (English→French) in the Natural Language Processing domain for two groups of post-editors: domain experts (academics encouraged to post-edit their own articles) on the one hand and trained translators on the other. Our results confirm that such pipelines can be effective, at least for high-resource language pairs. They also highlight the difference in the post-editing strategy of the two subgroups. Finally, they suggest that working on term translation is the most pressing issue to improve fully automatic translations, but that in a post-editing setup, other error types can be equally annoying for post-editors.

2023

pdf abs
Utiliser les syntagmes nominaux complexes anglais pour évaluer la robustesse des systèmes de traduction anglais-français en langue de spécialité
Maud Bénard
Actes de CORIA-TALN 2023. Actes des 16e Rencontres Jeunes Chercheurs en RI (RJCRI) et 25e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL)

Nous défendons l’idée que l’analyse des erreurs faites lors de la traduction des syntagmes nominaux complexes présente un intérêt pour évaluer la robustesse des systèmes de traduction automatique anglais-français en langue de spécialité. Ces constructions syntaxiques impliquent des questions de syntaxe et de lexique qui constituent un obstacle important à leur compréhension et leur production pour les locuteurs d’anglais non natifs. Nous soutenons que ces analyses contribueraient à garantir que les systèmes de TA répondent aux exigences linguistiques des utilisateurs finaux auxquels ils sont destinés.

Cette contribution présente le projet MaTOS (Machine Translation for Open Science), qui vise à développer de nouvelles méthodes pour la traduction automatique (TA) intégrale de documents scientifiques entre le français et l’anglais, ainsi que des métriques automatiques pour évaluer la qualité des traductions produites. Pour ce faire, MaTOS s’intéresse (a) au recueil de ressources ouvertes pour la TA spécialisée; (b) à la description des marqueurs de cohérence textuelle pour les articles scientifiques; (c) au développement de nouvelles méthodes de traitement multilingue pour les documents; (d) aux métriques mesurant les progrès de la traduction de documents complets.

pdf abs
Investigating Techniques for a Deeper Understanding of Neural Machine Translation (NMT) Systems through Data Filtering and Fine-tuning Strategies
Lichao Zhu | Maria Zimina | Maud Bénard | Behnoosh Namdar | Nicolas Ballier | Guillaume Wisniewski | Jean-Baptiste Yunès
Proceedings of the Eighth Conference on Machine Translation

In the context of this biomedical shared task, we have implemented data filters to enhance the selection of relevant training data for fine- tuning from the available training data sources. Specifically, we have employed textometric analysis to detect repetitive segments within the test set, which we have then used for re- fining the training data used to fine-tune the mBart-50 baseline model. Through this approach, we aim to achieve several objectives: developing a practical fine-tuning strategy for training biomedical in-domain fr<>en models, defining criteria for filtering in-domain training data, and comparing model predictions, fine-tuning data in accordance with the test set to gain a deeper insight into the functioning of Neural Machine Translation (NMT) systems.