Cross-Lingual NLU: Mitigating Language-Specific Impact in Embeddings Leveraging Adversarial Learning

Saedeh Tahery, Sahar Kianian, Saeed Farzi


Abstract
Low-resource languages and computational expenses pose significant challenges in the domain of large language models (LLMs). Currently, researchers are actively involved in various efforts to tackle these challenges. Cross-lingual natural language processing (NLP) remains one of the most promising strategies to address these issues. In this paper, we introduce a novel approach that utilizes adversarial techniques to mitigate the impact of language-specific information in contextual embeddings generated by large multilingual language models, with potential applications in cross-lingual tasks. The study encompasses five different languages, including both Latin and non-Latin ones, in the context of two fundamental tasks in natural language understanding: intent detection and slot filling. The results primarily show that our current approach excels in zero-shot scenarios for Latin languages like Spanish. However, it encounters limitations when applied to languages distant from English, such as Thai and Persian. This highlights that while our approach effectively reduces the effect of language-specific information on the core meaning, it performs better for Latin languages that share language-specific nuances with English, as certain characteristics persist in the overall meaning within embeddings.
Anthology ID:
2024.lrec-main.370
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
4158–4163
Language:
URL:
https://aclanthology.org/2024.lrec-main.370
DOI:
Bibkey:
Cite (ACL):
Saedeh Tahery, Sahar Kianian, and Saeed Farzi. 2024. Cross-Lingual NLU: Mitigating Language-Specific Impact in Embeddings Leveraging Adversarial Learning. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4158–4163, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Cross-Lingual NLU: Mitigating Language-Specific Impact in Embeddings Leveraging Adversarial Learning (Tahery et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/2024.lrec-main.370.pdf