2025
pdf
bib
abs
Code-switching in Context: Investigating the Role of Discourse Topic in Bilingual Speech Production
Debasmita Bhattacharya
|
Anxin Yi
|
Siying Ding
|
Julia Hirschberg
Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)
Code-switching (CSW) in speech is motivated by conversational factors across levels of linguistic analysis. While we know much about why speakers code-switch, there remains great scope for exploring how CSW occurs in speech, particularly within the discourse-level linguistic context. We build on prior work by asking: how are patterns of CSW influenced by different conversational contexts spanning Academic, Cultural, Personal, and Professional discourse topics? To answer this, we annotate a Mandarin-English spontaneous speech corpus, and analyze its discourse topics alongside various aspects of CSW production. We show that discourse topics interact significantly with utterance-level CSW, resulting in distinctive patterns of CSW presence, richness, language direction, and syntax that are uniquely associated with different contexts. Our work is the first to take such a context-sensitive approach to studying CSW, contributing to a broader understanding of the discourse topics that motivate speakers to code-switch in diverse ways.
pdf
bib
abs
Does Context Matter? A Prosodic Comparison of English and Spanish in Monolingual and Multilingual Discourse Settings
Debasmita Bhattacharya
|
David Sasu
|
Michela Marchini
|
Natalie Schluter
|
Julia Hirschberg
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Different languages are known to have typical and distinctive prosodic profiles. However, the majority of work on prosody across languages has been restricted to monolingual discourse contexts. We build on prior studies by asking: how does the nature of the discourse context influence variations in the prosody of monolingual speech? To answer this question, we compare the prosody of spontaneous, conversational monolingual English and Spanish both in monolingual and in multilingual speech settings. For both languages, we find that monolingual speech produced in a monolingual context is prosodically different from that produced in a multilingual context, with more marked differences having increased proximity to multilingual discourse. Our work is the first to incorporate multilingual discourse contexts into the study of native-level monolingual prosody, and has potential downstream applications for the recognition and synthesis of multilingual speech.
pdf
bib
abs
Discourse-Driven Code-Switching: Analyzing the Role of Content and Communicative Function in Spanish-English Bilingual Speech
Debasmita Bhattacharya
|
Juan Junco
|
Divya Tadimeti
|
Julia Hirschberg
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Code-switching (CSW) is commonly observed among bilingual speakers, and is motivated by various paralinguistic, syntactic, and morphological aspects of conversation. We build on prior work by asking: how do discourse-level aspects of dialogue – i.e. the content and function of speech – influence patterns of CSW? To answer this, we analyze the named entities and dialogue acts present in a Spanish-English spontaneous speech corpus, and build a predictive model of CSW based on our statistical findings. We show that discourse content and function interact with patterns of CSW to varying degrees, with a stronger influence from function overall. Our work is the first to take a discourse-sensitive approach to understanding the pragmatic and referential cues of bilingual speech and has potential applications in improving the prediction, recognition, and synthesis of code-switched speech that is grounded in authentic aspects of multilingual discourse.
2024
pdf
bib
abs
Measuring Entrainment in Spontaneous Code-switched Speech
Debasmita Bhattacharya
|
Siying Ding
|
Alayna Nguyen
|
Julia Hirschberg
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
It is well-known that speakers who entrain to one another have more successful conversations than those who do not. Previous research has shown that interlocutors entrain on linguistic features in both written and spoken monolingual domains. More recent work on code-switched communication has also shown preliminary evidence of entrainment on certain aspects of code-switching (CSW). However, such studies of entrainment in code-switched domains have been extremely few and restricted to human-machine textual interactions. Our work studies code-switched spontaneous speech between humans, finding that (1) patterns of written and spoken entrainment in monolingual settings largely generalize to code-switched settings, and (2) some patterns of entrainment on code-switching in dialogue agent-generated text generalize to spontaneous code-switched speech. Our findings give rise to important implications for the potentially “universal” nature of entrainment as a communication phenomenon, and potential applications in inclusive and interactive speech technology.
2020
pdf
bib
abs
Filler-gaps that neural networks fail to generalize
Debasmita Bhattacharya
|
Marten van Schijndel
Proceedings of the 24th Conference on Computational Natural Language Learning
It can be difficult to separate abstract linguistic knowledge in recurrent neural networks (RNNs) from surface heuristics. In this work, we probe for highly abstract syntactic constraints that have been claimed to govern the behavior of filler-gap dependencies across different surface constructions. For models to generalize abstract patterns in expected ways to unseen data, they must share representational features in predictable ways. We use cumulative priming to test for representational overlap between disparate filler-gap constructions in English and find evidence that the models learn a general representation for the existence of filler-gap dependencies. However, we find no evidence that the models learn any of the shared underlying grammatical constraints we tested. Our work raises questions about the degree to which RNN language models learn abstract linguistic representations.
2019
pdf
bib
abs
Racial Bias in Hate Speech and Abusive Language Detection Datasets
Thomas Davidson
|
Debasmita Bhattacharya
|
Ingmar Weber
Proceedings of the Third Workshop on Abusive Language Online
Technologies for abusive language detection are being developed and applied with little consideration of their potential biases. We examine racial bias in five different sets of Twitter data annotated for hate speech and abusive language. We train classifiers on these datasets and compare the predictions of these classifiers on tweets written in African-American English with those written in Standard American English. The results show evidence of systematic racial bias in all datasets, as classifiers trained on them tend to predict that tweets written in African-American English are abusive at substantially higher rates. If these abusive language detection systems are used in the field they will therefore have a disproportionate negative impact on African-American social media users. Consequently, these systems may discriminate against the groups who are often the targets of the abuse we are trying to detect.