Abstract
We demonstrate a simple yet effective approach to augmenting training data for multilingual named entity recognition using translations. The named entity spans from the original sentences are transferred to translations via word alignment and then filtered with the baseline recognizer. The proposed approach outperforms the baseline XLM-Roberta on the multilingual dataset.- Anthology ID:
- 2023.semeval-1.239
- Volume:
- Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1718–1722
- Language:
- URL:
- https://aclanthology.org/2023.semeval-1.239
- DOI:
- 10.18653/v1/2023.semeval-1.239
- Cite (ACL):
- Alberto Poncelas, Maksim Tkachenko, and Ohnmar Htun. 2023. Sakura at SemEval-2023 Task 2: Data Augmentation via Translation. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 1718–1722, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Sakura at SemEval-2023 Task 2: Data Augmentation via Translation (Poncelas et al., SemEval 2023)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2023.semeval-1.239.pdf