Culture-aware machine translation: the case study of low-resource language pair Catalan-Chinese
Xixian Liao, Carlos Escolano, Audrey Mash, Francesca De Luca Fornaciari, Javier García Gilabert, Miguel Claramunt Argote, Ella Bohman, Maite Melero
Abstract
High-quality machine translation requires datasets that not only ensure linguistic accuracy but also capture regional and cultural nuances. While many existing benchmarks, such as FLORES-200, rely on English as a pivot language, this approach can overlook the specificity of direct language pairs, particularly for underrepresented combinations like Catalan-Chinese. In this study, we demonstrate that even with a relatively small dataset of approximately 1,000 sentences, we can significantly improve MT localization. To this end, we introduce a dataset specifically designed to enhance Catalan-to-Chinese translation by prioritizing regionally and culturally specific topics. Unlike pivot-based datasets, our data source ensures a more faithful representation of Catalan linguistic and cultural elements, leading to more accurate translations of local terms and expressions. Using this dataset, we demonstrate better performance over the English-pivot FLORES-200 dev set and achieve competitive results on the FLORES-200 devtest set when evaluated with neural-based metrics. We release this dataset as both a human-preference resource and a benchmark for Catalan-Chinese translation. Additionally, we include Spanish translations for each sentence, facilitating extensions to Spanish-Chinese translation tasks.- Anthology ID:
- 2025.mtsummit-1.12
- Volume:
- Proceedings of Machine Translation Summit XX: Volume 1
- Month:
- June
- Year:
- 2025
- Address:
- Geneva, Switzerland
- Editors:
- Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Ana C. Farinha, Marco Gaido, Joke Daems, Dorothy Kenny, Helena Moniz, Sara Szoc
- Venue:
- MTSummit
- SIG:
- Publisher:
- European Association for Machine Translation
- Note:
- Pages:
- 150–161
- Language:
- URL:
- https://preview.aclanthology.org/mtsummit-25-ingestion/2025.mtsummit-1.12/
- DOI:
- Cite (ACL):
- Xixian Liao, Carlos Escolano, Audrey Mash, Francesca De Luca Fornaciari, Javier García Gilabert, Miguel Claramunt Argote, Ella Bohman, and Maite Melero. 2025. Culture-aware machine translation: the case study of low-resource language pair Catalan-Chinese. In Proceedings of Machine Translation Summit XX: Volume 1, pages 150–161, Geneva, Switzerland. European Association for Machine Translation.
- Cite (Informal):
- Culture-aware machine translation: the case study of low-resource language pair Catalan-Chinese (Liao et al., MTSummit 2025)
- PDF:
- https://preview.aclanthology.org/mtsummit-25-ingestion/2025.mtsummit-1.12.pdf