Findings of the CoCo4MT 2023 Shared Task on Corpus Construction for Machine Translation
Ananya Ganesh, Marine Carpuat, William Chen, Katharina Kann, Constantine Lignos, John E. Ortega, Jonne Saleva, Shabnam Tafreshi, Rodolfo Zevallos
Abstract
This paper provides an overview of the first shared task on choosing beneficial instances for machine translation, conducted as part of the CoCo4MT 2023 Workshop at MTSummit. This shared task was motivated by the need to make the data annotation process for machine translation more efficient, particularly for low-resource languages for which collecting human translations may be difficult or expensive. The task involved developing methods for selecting the most beneficial instances for training a machine translation system without access to an existing parallel dataset in the target language, such that the best selected instances can then be manually translated. Two teams participated in the shared task, namely the Williams team and the AST team. Submissions were evaluated by training a machine translation model on each submission’s chosen instances, and comparing their performance with the chRF++ score. The system that ranked first is by the Williams team, that finds representative instances by clustering the training data.- Anthology ID:
- 2023.mtsummit-coco4mt.3
- Volume:
- Proceedings of the Second Workshop on Corpus Generation and Corpus Augmentation for Machine Translation
- Month:
- September
- Year:
- 2023
- Address:
- Macau SAR, China
- Venue:
- MTSummit
- SIG:
- Publisher:
- Asia-Pacific Association for Machine Translation
- Note:
- Pages:
- 22–27
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.mtsummit-coco4mt.3/
- DOI:
- Cite (ACL):
- Ananya Ganesh, Marine Carpuat, William Chen, Katharina Kann, Constantine Lignos, John E. Ortega, Jonne Saleva, Shabnam Tafreshi, and Rodolfo Zevallos. 2023. Findings of the CoCo4MT 2023 Shared Task on Corpus Construction for Machine Translation. In Proceedings of the Second Workshop on Corpus Generation and Corpus Augmentation for Machine Translation, pages 22–27, Macau SAR, China. Asia-Pacific Association for Machine Translation.
- Cite (Informal):
- Findings of the CoCo4MT 2023 Shared Task on Corpus Construction for Machine Translation (Ganesh et al., MTSummit 2023)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.mtsummit-coco4mt.3.pdf