Kent K. Chang

2026

Multimodal Conversation Structure Understanding
Kent K. Chang | Mackenzie Hanh Cramer | Anna Ho | Ti Ti Nguyen | Yilin Yuan | David Bamman
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

While multimodal large language models (LLMs) excel at dialogue, whether they can adequately parse the structure of conversation—conversational roles and threading—remains underexplored. In this work, we introduce a suite of tasks and release TV-MMPC, a new annotated dataset, for multimodal conversation structure understanding. Our evaluation reveals that while all multimodal LLMs outperform our heuristic baseline, even the best-performing model we consider experiences a substantial drop in performance when character identities of the conversation are anonymized. Beyond evaluation, we carry out a sociolinguistic analysis of 350,842 utterances in TVQA. We find that while female characters initiate conversations at rates in proportion to their speaking time, they are 1.2 times more likely than men to be cast as an addressee or side-participant, and the presence of side-participants shifts the conversational register from personal to social.

pdf bib abs

Language Models as Measurement Apparatus for Culture
Kent K. Chang
Proceedings of The Big Picture v2: Crafting a Research Narrative

Language models are increasingly used to quantify cultural phenomena, but what makes such measurement distinctively cultural? This paper argues that NLP work on culture is a material-discursive practice: the apparatus—model, data, annotation, evaluation—participates in constituting the cultural reality it measures, rather than passively recording it. Drawing on Karen Barad’s concept of the agential cut—the contingent boundary between phenomenon and instrument—I show that the apparatus’s substantive design choices draw such boundaries, and that the boundary is entangled from the start because language models have already internalized much of the cultural material they measure. I illustrate this through three case studies on television and film dialogue and two examinations of the apparatus itself: erasure of character names as cultural markers, and attunement to historically distant Restoration drama. This big picture analysis proposes a research program that is theory-driven, empirically rigorous, and culturally contingent, treating each agential cut as a conscious commitment.

2023

pdf bib abs

Dramatic Conversation Disentanglement
Kent K. Chang | Danica Chen | David Bamman
Findings of the Association for Computational Linguistics: ACL 2023

We present a new dataset for studying conversation disentanglement in movies and TV series. While previous work has focused on conversation disentanglement in IRC chatroom dialogues, movies and TV shows provide a space for studying complex pragmatic patterns of floor and topic change in face-to-face multi-party interactions. In this work, we draw on theoretical research in sociolinguistics, sociology, and film studies to operationalize a conversational thread (including the notion of a floor change) in dramatic texts, and use that definition to annotate a dataset of 10,033 dialogue turns (comprising 2,209 threads) from 831 movies. We compare the performance of several disentanglement models on this dramatic dataset, and apply the best-performing model to disentangle 808 movies. We see that, contrary to expectation, average thread lengths do not decrease significantly over the past 40 years, and characters portrayed by actors who are women, while underrepresented, initiate more new conversational threads relative to their speaking time.

pdf bib abs

Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
Kent K. Chang | Mackenzie Cramer | Sandeep Soni | David Bamman
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

In this work, we carry out a data archaeology to infer books that are known to ChatGPT and GPT-4 using a name cloze membership inference query. We find that OpenAI models have memorized a wide collection of copyrighted materials, and that the degree of memorization is tied to the frequency with which passages of those books appear on the web. The ability of these models to memorize an unknown set of books complicates assessments of measurement validity for cultural analytics by contaminating test data; we show that models perform much better on memorized books than on non-memorized books for downstream tasks. We argue that this supports a case for open models whose training data is known.

Co-authors

Ti Ti Nguyen 1

Sandeep Soni 1

Yilin Yuan 1

Venues

Fix author