Exploiting In-Domain Bilingual Corpora for Zero-Shot Transfer Learning in NLU of Intra-Sentential Code-Switching Chatbot Interactions
Maia Aguirre, Manex Serras, Laura García-sardiña, Jacobo López-fernández, Ariane Méndez, Arantza Del Pozo
Abstract
Code-switching (CS) is a very common phenomenon in regions with various co-existing languages. Since CS is such a frequent habit in informal communications, both spoken and written, it also arises naturally in Human-Machine Interactions. Therefore, in order for natural language understanding (NLU) not to be degraded, CS must be taken into account when developing chatbots. The co-existence of multiple languages in a single NLU model has become feasible with multilingual language representation models such as mBERT. In this paper, the efficacy of zero-shot cross-lingual transfer learning with mBERT for NLU is evaluated on a Basque-Spanish CS chatbot corpus, comparing the performance of NLU models trained using in-domain chatbot utterances in Basque and/or Spanish without CS. The results obtained indicate that training joint multi-intent classification and entity recognition models on both languages simultaneously achieves best performance, better capturing the CS patterns.- Anthology ID:
- 2022.emnlp-industry.13
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, UAE
- Editors:
- Yunyao Li, Angeliki Lazaridou
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 138–144
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-industry.13
- DOI:
- 10.18653/v1/2022.emnlp-industry.13
- Cite (ACL):
- Maia Aguirre, Manex Serras, Laura García-sardiña, Jacobo López-fernández, Ariane Méndez, and Arantza Del Pozo. 2022. Exploiting In-Domain Bilingual Corpora for Zero-Shot Transfer Learning in NLU of Intra-Sentential Code-Switching Chatbot Interactions. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 138–144, Abu Dhabi, UAE. Association for Computational Linguistics.
- Cite (Informal):
- Exploiting In-Domain Bilingual Corpora for Zero-Shot Transfer Learning in NLU of Intra-Sentential Code-Switching Chatbot Interactions (Aguirre et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2022.emnlp-industry.13.pdf