Abstract
Word alignments are essential for a variety of NLP tasks. Therefore, choosing the best approaches for their creation is crucial. However, the scarce availability of gold evaluation data makes the choice difficult. We propose SilverAlign, a new method to automatically create silver data for the evaluation of word aligners by exploiting machine translation and minimal pairs. We show that performance on our silver data correlates well with gold benchmarks for 9 language pairs, making our approach a valid resource for evaluation of different languages and domains when gold data is not available. This addresses the important scenario of missing gold data alignments for low-resource languages.- Anthology ID:
- 2024.lrec-main.1290
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 14812–14825
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.1290
- DOI:
- Cite (ACL):
- Abdullatif Koksal, Silvia Severini, and Hinrich Schütze. 2024. SilverAlign: MT-Based Silver Data Algorithm for Evaluating Word Alignment. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 14812–14825, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- SilverAlign: MT-Based Silver Data Algorithm for Evaluating Word Alignment (Koksal et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2024.lrec-main.1290.pdf