Baselines for Detection and Classification of Discourse Presentation in English Narrative

Reinaldo Di Polo, Mustafa Ocal, Mark Finlayson


Abstract
Discourse presentation is when speech, writing, or thought (SW&T) attributed to a discourse entity (such as a character in a narrative) is presented within a discourse. Discourse presentations can be generally broken into direct or indirect: direct presentation is when the text quotes the words or thoughts verbatim, whereas in indirect presentation the text expresses the SW&T in the narrator’s or writer’s own words. Automatically detecting and categorizing discourse presentations supports discourse and narrative analysis and improves attribution for downstream NLP tasks, but detecting indirect discourse presentations remains challenging due to diverse surface forms and subtle perspective shifts. We study detection and categorization of discourse presentations on a corrected version of the Semino & Short’s English Narrative SW&TP corpus. We cast the task as five-way clause classification: Direct Speech & Writing, Direct Thought, Indirect Speech & Writing, Indirect Thought, and Narrative (i.e., no discourse presentation). We compare four approaches: (1) CNN; (2) generative baseline (Claude Sonnet 4.6); (3) untuned BERT, and (4) fine-tuned BERT. The CNN baseline achieves 0.43 F1 and exhibits substantial confusion with the Narrative class. Claude achieves 0.71 F1 but performs unevenly across classes and fails to recover Indirect Thought. BERT achieves 0.81 F1 overall but struggles on indirect categories. The fine-tuning BERT yields strong performance (0.88 F1), with remaining errors concentrated in Indirect Speech & Writing (F1 = 0.60). We release our code and the corrected dataset to support reproducibility. To our knowledge, this is the first time computational approaches have been evaluated across the full range of SW&TP discourse presentation types.
Anthology ID:
2026.codi-1.3
Volume:
Proceedings of the 2nd Joint Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences and Computational Models of Reference, Anaphora and Coreference (CODI-CRAC 2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Chloé Braud, Christian Hardmeier, Maciej Ogrodniczuk, Sharid Loaiciga, Amir Zeldes, Michal Novák, Chuyuan Li, Michael Strube, Junyi Jessy Li
Venues:
CODI | CRAC | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–11
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.codi-1.3/
DOI:
Bibkey:
Cite (ACL):
Reinaldo Di Polo, Mustafa Ocal, and Mark Finlayson. 2026. Baselines for Detection and Classification of Discourse Presentation in English Narrative. In Proceedings of the 2nd Joint Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences and Computational Models of Reference, Anaphora and Coreference (CODI-CRAC 2026), pages 1–11, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Baselines for Detection and Classification of Discourse Presentation in English Narrative (Di Polo et al., CODI-CRAC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.codi-1.3.pdf
Supplementarymaterial:
 2026.codi-1.3.SupplementaryMaterial.zip