Surprisal Predicts Code-Switching in Chinese-English Bilingual Text

Jesús Calvillo, Le Fang, Jeremy Cole, David Reitter


Abstract
Why do bilinguals switch languages within a sentence? The present observational study asks whether word surprisal and word entropy predict code-switching in bilingual written conversation. We describe and model a new dataset of Chinese-English text with 1476 clean code-switched sentences, translated back into Chinese. The model includes known control variables together with word surprisal and word entropy. We found that word surprisal, but not entropy, is a significant predictor that explains code-switching above and beyond other well-known predictors. We also found sentence length to be a significant predictor, which has been related to sentence complexity. We propose high cognitive effort as a reason for code-switching, as it leaves fewer resources for inhibition of the alternative language. We also corroborate previous findings, but this time using a computational model of surprisal, a new language pair, and doing so for written language.
Anthology ID:
2020.emnlp-main.330
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4029–4039
Language:
URL:
https://aclanthology.org/2020.emnlp-main.330
DOI:
10.18653/v1/2020.emnlp-main.330
Bibkey:
Cite (ACL):
Jesús Calvillo, Le Fang, Jeremy Cole, and David Reitter. 2020. Surprisal Predicts Code-Switching in Chinese-English Bilingual Text. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4029–4039, Online. Association for Computational Linguistics.
Cite (Informal):
Surprisal Predicts Code-Switching in Chinese-English Bilingual Text (Calvillo et al., EMNLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/paclic-22-ingestion/2020.emnlp-main.330.pdf
Video:
 https://slideslive.com/38938918