Third-Party Aligner for Neural Word Alignments
Jinpeng Zhang, Chuanqi Dong, Xiangyu Duan, Yuqi Zhang, Min Zhang
Abstract
Word alignment is to find translationally equivalent words between source and target sentences. Previous work has demonstrated that self-training can achieve competitive word alignment results. In this paper, we propose to use word alignments generated by a third-party word aligner to supervise the neural word alignment training. Specifically, source word and target word of each word pair aligned by the third-party aligner are trained to be close neighbors to each other in the contextualized embedding space when fine-tuning a pre-trained cross-lingual language model. Experiments on the benchmarks of various language pairs show that our approach can surprisingly do self-correction over the third-party supervision by finding more accurate word alignments and deleting wrong word alignments, leading to better performance than various third-party word aligners, including the currently best one. When we integrate all supervisions from various third-party aligners, we achieve state-of-the-art word alignment performances, with averagely more than two points lower alignment error rates than the best third-party aligner.We released our code at https://github.com/sdongchuanqi/Third-Party-Supervised-Aligner.- Anthology ID:
- 2022.findings-emnlp.228
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2022
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3134–3145
- Language:
- URL:
- https://aclanthology.org/2022.findings-emnlp.228
- DOI:
- 10.18653/v1/2022.findings-emnlp.228
- Cite (ACL):
- Jinpeng Zhang, Chuanqi Dong, Xiangyu Duan, Yuqi Zhang, and Min Zhang. 2022. Third-Party Aligner for Neural Word Alignments. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3134–3145, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Third-Party Aligner for Neural Word Alignments (Zhang et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/landing_page/2022.findings-emnlp.228.pdf