2023
pdf
abs
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
Mingda Chen
|
Paul-Ambroise Duquenne
|
Pierre Andrews
|
Justine Kao
|
Alexandre Mourachko
|
Holger Schwenk
|
Marta R. Costa-jussà
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
End-to-End speech-to-speech translation (S2ST) is generally evaluated with text-based metrics. This means that generated speech has to be automatically transcribed, making the evaluation dependent on the availability and quality of automatic speech recognition (ASR) systems.In this paper, we propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems. BLASER leverages a multilingual multimodal encoder to directly encode the speech segments for source input, translation output and reference into a shared embedding space and computes a score of the translation quality that can be used as a proxy to human evaluation. To evaluate our approach, we construct training and evaluation sets from more than 40k human annotations covering seven language directions.The best results of BLASER are achieved by training with supervision from human rating scores. We show that when evaluated at the sentence level, BLASER correlates significantly better with human judgment compared to ASR dependent metrics including ASR-SENTBLEU in all translation directions and ASR-COMET in five of them. Our analysis shows combining speech and text as inputs to BLASER does not increase the correlation with human scores, but best correlations are achieved when using speech, which motivates the goal of our research. Moreover, we show that using ASR for references is detrimental for text-based metrics.
pdf
abs
xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages
Mingda Chen
|
Kevin Heffernan
|
Onur Çelebi
|
Alexandre Mourachko
|
Holger Schwenk
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
We introduce a new proxy score for evaluating bitext mining based on similarity in a multilingual embedding space: xsim++. In comparison to xsim, this improved proxy leverages rule-based approaches to extend English sentences in any evaluation set with synthetic, hard-to-distinguish examples which more closely mirror the scenarios we encounter during large-scale mining. We validate this proxy by running a significant number of bitext mining experiments for a set of low-resource languages, and subsequently train NMT systems on the mined data. In comparison to xsim, we show that xsim++ is better correlated with the downstream BLEU scores of translation systems trained on mined bitexts, providing a reliable proxy of bitext mining performance without needing to run expensive bitext mining pipelines. xsim++ also reports performance for different error types, offering more fine-grained feedbacks for model development.
2022
pdf
abs
stopes - Modular Machine Translation Pipelines
Pierre Andrews
|
Guillaume Wenzek
|
Kevin Heffernan
|
Onur Çelebi
|
Anna Sun
|
Ammar Kamran
|
Yingzhe Guo
|
Alexandre Mourachko
|
Holger Schwenk
|
Angela Fan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Neural machine translation, as other natural language deep learning applications, is hungry for data. As research evolves, the data pipelines supporting that research evolve too, oftentimes re-implementing the same core components. Despite the potential of modular codebases, researchers have but little time to put code structure and reusability first. Unfortunately, this makes it very hard to publish clean, reproducible code to benefit a wider audience. In this paper, we motivate and describe stopes , a framework that addresses these issues while empowering scalability and versatility for research use cases. This library was a key enabler of the No Language Left Behind project, establishing new state of the art performance for a multilingual machine translation model covering 200 languages. stopes and the pipelines described are released under the MIT license at https://github.com/ facebookresearch/stopes.
pdf
abs
Findings of the WMT’22 Shared Task on Large-Scale Machine Translation Evaluation for African Languages
David Adelani
|
Md Mahfuz Ibn Alam
|
Antonios Anastasopoulos
|
Akshita Bhagia
|
Marta R. Costa-jussà
|
Jesse Dodge
|
Fahim Faisal
|
Christian Federmann
|
Natalia Fedorova
|
Francisco Guzmán
|
Sergey Koshelev
|
Jean Maillard
|
Vukosi Marivate
|
Jonathan Mbuya
|
Alexandre Mourachko
|
Safiyyah Saleem
|
Holger Schwenk
|
Guillaume Wenzek
Proceedings of the Seventh Conference on Machine Translation (WMT)
We present the results of the WMT’22 SharedTask on Large-Scale Machine Translation Evaluation for African Languages. The shared taskincluded both a data and a systems track, alongwith additional innovations, such as a focus onAfrican languages and extensive human evaluation of submitted systems. We received 14system submissions from 8 teams, as well as6 data track contributions. We report a largeprogress in the quality of translation for Africanlanguages since the last iteration of this sharedtask: there is an increase of about 7.5 BLEUpoints across 72 language pairs, and the average BLEU scores went from 15.09 to 22.60.