Kurt Abela


2024

pdf
COMET for Low-Resource Machine Translation Evaluation: A Case Study of English-Maltese and Spanish-Basque
Júlia Falcão | Claudia Borg | Nora Aranberri | Kurt Abela
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Trainable metrics for machine translation evaluation have been scoring the highest correlations with human judgements in the latest meta-evaluations, outperforming traditional lexical overlap metrics such as BLEU, which is still widely used despite its well-known shortcomings. In this work we look at COMET, a prominent neural evaluation system proposed in 2020, to analyze the extent of its language support restrictions, and to investigate strategies to extend this support to new, under-resourced languages. Our case study focuses on English-Maltese and Spanish-Basque. We run a crowd-based evaluation campaign to collect direct assessments and use the annotated dataset to evaluate COMET-22, further fine-tune it, and to train COMET models from scratch for the two language pairs. Our analysis suggests that COMET’s performance can be improved with fine-tuning, and that COMET can be highly susceptible to the distribution of scores in the training data, which especially impacts low-resource scenarios.

2023

pdf
UM-DFKI Maltese Speech Translation
Aiden Williams | Kurt Abela | Rishu Kumar | Martin Bär | Hannah Billinghurst | Kurt Micallef | Ahnaf Mozib Samin | Andrea DeMarco | Lonneke van der Plas | Claudia Borg
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

For the 2023 IWSLT Maltese Speech Translation Task, UM-DFKI jointly presents a cascade solution which achieves 0.6 BLEU. While this is the first time that a Maltese speech translation task has been released by IWSLT, this paper explores previous solutions for other speech translation tasks, focusing primarily on low-resource scenarios. Moreover, we present our method of fine-tuning XLS-R models for Maltese ASR using a collection of multi-lingual speech corpora as well as the fine-tuning of the mBART model for Maltese to English machine translation.