Towards Fast and Accurate Modeling for Cross-Lingual Label Projection
Thang Le, Huy Huu Nguyen, Anh Tuan Luu, Thamar Solorio, Thien Huu Nguyen
Abstract
Information extraction (IE) systems rely on structured data for training, but such annotated data is highly imbalanced across languages, with low-resource languages receiving little attention. Label projection techniques aim to bridge this gap by transferring structured annotations from high-resource to low-resource languages. However, existing methods are either inaccurate or too slow for large-scale use. This work aims to address this problem by developing a more effective method that remains sufficiently efficient for large-scale projection. In particular, we propose to synthesize alignment sequence pairs and fine-tune an encoder model with span alignment objective, while controlling data influence during training. Experimental results across 50+ languages show that our framework consistently outperforms previous state-of-the-art methods while maintaining fast inference speed. In addition, we introduce EXP - the first benchmark for explicit evaluation of label projection, thereby reducing confounders and non-determinism in method assessment.- Anthology ID:
- 2026.acl-long.1817
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 39175–39198
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1817/
- DOI:
- Cite (ACL):
- Thang Le, Huy Huu Nguyen, Anh Tuan Luu, Thamar Solorio, and Thien Huu Nguyen. 2026. Towards Fast and Accurate Modeling for Cross-Lingual Label Projection. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 39175–39198, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Towards Fast and Accurate Modeling for Cross-Lingual Label Projection (Le et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1817.pdf