Kurt Abela
2024
COMET for Low-Resource Machine Translation Evaluation: A Case Study of English-Maltese and Spanish-Basque
Júlia Falcão
|
Claudia Borg
|
Nora Aranberri
|
Kurt Abela
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Trainable metrics for machine translation evaluation have been scoring the highest correlations with human judgements in the latest meta-evaluations, outperforming traditional lexical overlap metrics such as BLEU, which is still widely used despite its well-known shortcomings. In this work we look at COMET, a prominent neural evaluation system proposed in 2020, to analyze the extent of its language support restrictions, and to investigate strategies to extend this support to new, under-resourced languages. Our case study focuses on English-Maltese and Spanish-Basque. We run a crowd-based evaluation campaign to collect direct assessments and use the annotated dataset to evaluate COMET-22, further fine-tune it, and to train COMET models from scratch for the two language pairs. Our analysis suggests that COMET’s performance can be improved with fine-tuning, and that COMET can be highly susceptible to the distribution of scores in the training data, which especially impacts low-resource scenarios.
2023
UM-DFKI Maltese Speech Translation
Aiden Williams
|
Kurt Abela
|
Rishu Kumar
|
Martin Bär
|
Hannah Billinghurst
|
Kurt Micallef
|
Ahnaf Mozib Samin
|
Andrea DeMarco
|
Lonneke van der Plas
|
Claudia Borg
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
For the 2023 IWSLT Maltese Speech Translation Task, UM-DFKI jointly presents a cascade solution which achieves 0.6 BLEU. While this is the first time that a Maltese speech translation task has been released by IWSLT, this paper explores previous solutions for other speech translation tasks, focusing primarily on low-resource scenarios. Moreover, we present our method of fine-tuning XLS-R models for Maltese ASR using a collection of multi-lingual speech corpora as well as the fine-tuning of the mBART model for Maltese to English machine translation.
Search
Co-authors
- Claudia Borg 2
- Júlia Falcão 1
- Nora Aranberri 1
- Aiden Williams 1
- Rishu Kumar 1
- show all...