Modeling Cultural and Subcultural Variation in Code-Switched Discourse with Topic Annotation

Nemika Tyagi, Nelvin Licona-Guevara, Olga Kellert


Abstract
Code-switching is often modeled in NLP as a structural or token-level phenomenon, overlooking its role as a discourse practice shaped by social and cultural context. In this work, we propose topic-based annotation as a framework for analyzing cultural and subcultural variation in bilingual discourse. Using large language models, we annotate 3,691 code-switched sentences from Spanish-English (Miami) and Spanish-Guaraní (Paraguay) corpora with topic and discourse-level information, integrating sociolinguistic metadata. Our analysis reveals systematic relationships between discourse topics, language choice, and social variables such as gender and language dominance. We observe subcultural variation within the Miami community and a clear diglossic distribution in Paraguay, where Guaraní is associated with formal domains and Spanish with informal communication. These findings suggest that modeling code-switching through discourse-level categories provides a more complete representation of multilingual communication and enables both cross-cultural and intra-cultural comparison at scale.
Anthology ID:
2026.c3nlp-1.3
Volume:
Proceedings of the 4th Workshop on Cross-Cultural Considerations in NLP (C3NLP 2026)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Vinodkumar Prabhakaran, Sunipa Dev, Luciana Benotti, Daniel Hershcovich, Yong Cao, Li Zhou, BOlei Ma, Ife Adebara
Venues:
C3NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
40–49
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.c3nlp-1.3/
DOI:
Bibkey:
Cite (ACL):
Nemika Tyagi, Nelvin Licona-Guevara, and Olga Kellert. 2026. Modeling Cultural and Subcultural Variation in Code-Switched Discourse with Topic Annotation. In Proceedings of the 4th Workshop on Cross-Cultural Considerations in NLP (C3NLP 2026), pages 40–49, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Modeling Cultural and Subcultural Variation in Code-Switched Discourse with Topic Annotation (Tyagi et al., C3NLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.c3nlp-1.3.pdf