Exploiting In-Domain Bilingual Corpora for Zero-Shot Transfer Learning in NLU of Intra-Sentential Code-Switching Chatbot Interactions

Maia Aguirre, Manex Serras, Laura García-sardiña, Jacobo López-fernández, Ariane Méndez, Arantza Del Pozo


Abstract
Code-switching (CS) is a very common phenomenon in regions with various co-existing languages. Since CS is such a frequent habit in informal communications, both spoken and written, it also arises naturally in Human-Machine Interactions. Therefore, in order for natural language understanding (NLU) not to be degraded, CS must be taken into account when developing chatbots. The co-existence of multiple languages in a single NLU model has become feasible with multilingual language representation models such as mBERT. In this paper, the efficacy of zero-shot cross-lingual transfer learning with mBERT for NLU is evaluated on a Basque-Spanish CS chatbot corpus, comparing the performance of NLU models trained using in-domain chatbot utterances in Basque and/or Spanish without CS. The results obtained indicate that training joint multi-intent classification and entity recognition models on both languages simultaneously achieves best performance, better capturing the CS patterns.
Anthology ID:
2022.emnlp-industry.13
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2022
Address:
Abu Dhabi, UAE
Editors:
Yunyao Li, Angeliki Lazaridou
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
138–144
Language:
URL:
https://aclanthology.org/2022.emnlp-industry.13
DOI:
10.18653/v1/2022.emnlp-industry.13
Bibkey:
Cite (ACL):
Maia Aguirre, Manex Serras, Laura García-sardiña, Jacobo López-fernández, Ariane Méndez, and Arantza Del Pozo. 2022. Exploiting In-Domain Bilingual Corpora for Zero-Shot Transfer Learning in NLU of Intra-Sentential Code-Switching Chatbot Interactions. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 138–144, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Exploiting In-Domain Bilingual Corpora for Zero-Shot Transfer Learning in NLU of Intra-Sentential Code-Switching Chatbot Interactions (Aguirre et al., EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2022.emnlp-industry.13.pdf