Abstract
Code-switching is a ubiquitous phenomenon in multilingual communities. Natural language technologies that wish to communicate like humans must therefore adaptively incorporate code-switching techniques when they are deployed in multilingual settings. To this end, we propose a Hindi-English human-machine dialogue system that elicits code-switching conversations in a controlled setting. It uses different code-switching agent strategies to understand how users respond and accommodate to the agent’s language choice. Through this system, we collect and release a new dataset CommonDost, comprising of 439 human-machine multilingual conversations. We adapt pre-defined metrics to discover linguistic accommodation from users to agents. Finally, we compare these dialogues with Spanish-English dialogues collected in a similar setting, and analyze the impact of linguistic and socio-cultural factors on code-switching patterns across the two language pairs.- Anthology ID:
- 2020.conll-1.46
- Volume:
- Proceedings of the 24th Conference on Computational Natural Language Learning
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Raquel Fernández, Tal Linzen
- Venue:
- CoNLL
- SIG:
- SIGNLL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 565–577
- Language:
- URL:
- https://aclanthology.org/2020.conll-1.46
- DOI:
- 10.18653/v1/2020.conll-1.46
- Cite (ACL):
- Tanmay Parekh, Emily Ahn, Yulia Tsvetkov, and Alan W Black. 2020. Understanding Linguistic Accommodation in Code-Switched Human-Machine Dialogues. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 565–577, Online. Association for Computational Linguistics.
- Cite (Informal):
- Understanding Linguistic Accommodation in Code-Switched Human-Machine Dialogues (Parekh et al., CoNLL 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2020.conll-1.46.pdf
- Code
- tanmayparekh/commondost