Culture-aware machine translation: the case study of low-resource language pair Catalan-Chinese

Xixian Liao, Carlos Escolano, Audrey Mash, Francesca De Luca Fornaciari, Javier García Gilabert, Miguel Claramunt Argote, Ella Bohman, Maite Melero


Abstract
High-quality machine translation requires datasets that not only ensure linguistic accuracy but also capture regional and cultural nuances. While many existing benchmarks, such as FLORES-200, rely on English as a pivot language, this approach can overlook the specificity of direct language pairs, particularly for underrepresented combinations like Catalan-Chinese. In this study, we demonstrate that even with a relatively small dataset of approximately 1,000 sentences, we can significantly improve MT localization. To this end, we introduce a dataset specifically designed to enhance Catalan-to-Chinese translation by prioritizing regionally and culturally specific topics. Unlike pivot-based datasets, our data source ensures a more faithful representation of Catalan linguistic and cultural elements, leading to more accurate translations of local terms and expressions. Using this dataset, we demonstrate better performance over the English-pivot FLORES-200 dev set and achieve competitive results on the FLORES-200 devtest set when evaluated with neural-based metrics. We release this dataset as both a human-preference resource and a benchmark for Catalan-Chinese translation. Additionally, we include Spanish translations for each sentence, facilitating extensions to Spanish-Chinese translation tasks.
Anthology ID:
2025.mtsummit-1.12
Volume:
Proceedings of Machine Translation Summit XX: Volume 1
Month:
June
Year:
2025
Address:
Geneva, Switzerland
Editors:
Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Ana C. Farinha, Marco Gaido, Joke Daems, Dorothy Kenny, Helena Moniz, Sara Szoc
Venue:
MTSummit
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
150–161
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.mtsummit-1.12/
DOI:
Bibkey:
Cite (ACL):
Xixian Liao, Carlos Escolano, Audrey Mash, Francesca De Luca Fornaciari, Javier García Gilabert, Miguel Claramunt Argote, Ella Bohman, and Maite Melero. 2025. Culture-aware machine translation: the case study of low-resource language pair Catalan-Chinese. In Proceedings of Machine Translation Summit XX: Volume 1, pages 150–161, Geneva, Switzerland. European Association for Machine Translation.
Cite (Informal):
Culture-aware machine translation: the case study of low-resource language pair Catalan-Chinese (Liao et al., MTSummit 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.mtsummit-1.12.pdf