Miguel Claramunt Argote


2025

pdf bib
Culture-aware machine translation: the case study of low-resource language pair Catalan-Chinese
Xixian Liao | Carlos Escolano | Audrey Mash | Francesca De Luca Fornaciari | Javier García Gilabert | Miguel Claramunt Argote | Ella Bohman | Maite Melero
Proceedings of Machine Translation Summit XX: Volume 1

High-quality machine translation requires datasets that not only ensure linguistic accuracy but also capture regional and cultural nuances. While many existing benchmarks, such as FLORES-200, rely on English as a pivot language, this approach can overlook the specificity of direct language pairs, particularly for underrepresented combinations like Catalan-Chinese. In this study, we demonstrate that even with a relatively small dataset of approximately 1,000 sentences, we can significantly improve MT localization. To this end, we introduce a dataset specifically designed to enhance Catalan-to-Chinese translation by prioritizing regionally and culturally specific topics. Unlike pivot-based datasets, our data source ensures a more faithful representation of Catalan linguistic and cultural elements, leading to more accurate translations of local terms and expressions. Using this dataset, we demonstrate better performance over the English-pivot FLORES-200 dev set and achieve competitive results on the FLORES-200 devtest set when evaluated with neural-based metrics. We release this dataset as both a human-preference resource and a benchmark for Catalan-Chinese translation. Additionally, we include Spanish translations for each sentence, facilitating extensions to Spanish-Chinese translation tasks.