Abstract
Due to recent pretrained multilingual representation models, it has become feasible to exploit labeled data from one language to train a cross-lingual model that can then be applied to multiple new languages. In practice, however, we still face the problem of scarce labeled data, leading to subpar results. In this paper, we propose a novel data augmentation strategy for better cross-lingual natural language inference by enriching the data to reflect more diversity in a semantically faithful way. To this end, we propose two methods of training a generative model to induce synthesized examples, and then leverage the resulting data using an adversarial training regimen for more robustness. In a series of detailed experiments, we show that this fruitful combination leads to substantial gains in cross-lingual inference.- Anthology ID:
- 2021.acl-long.401
- Volume:
- Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Editors:
- Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
- Venues:
- ACL | IJCNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5158–5167
- Language:
- URL:
- https://aclanthology.org/2021.acl-long.401
- DOI:
- 10.18653/v1/2021.acl-long.401
- Cite (ACL):
- Xin Dong, Yaxin Zhu, Zuohui Fu, Dongkuan Xu, and Gerard de Melo. 2021. Data Augmentation with Adversarial Training for Cross-Lingual NLI. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5158–5167, Online. Association for Computational Linguistics.
- Cite (Informal):
- Data Augmentation with Adversarial Training for Cross-Lingual NLI (Dong et al., ACL-IJCNLP 2021)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2021.acl-long.401.pdf