Dataset Cartography for Implicit Discourse Relation Recognition: Promises and Pitfalls

Daniil Ignatev; Denis Paperno; Massimo Poesio

Dataset Cartography for Implicit Discourse Relation Recognition: Promises and Pitfalls

Daniil Ignatev, Denis Paperno, Massimo Poesio

Abstract

Crowdsourced data for implicit discourse relation recognition, IDRR, has been shown to contain both plausible interpretations and noisy annotations. We present a case study of dataset cartography (Swayamdipta 2020) on IDRR-focused DiscoGeM corpus (Scholman et al., 2022). Our findings show that error identification via low confidence proves unreliable, as confidence is strongly affected by label rarity. However, high-confidence datapoints reveal a different use case: auditing the cue-rich regions of the dataset. Our lexical probe demonstrates an association between high confidence items and (mostly temporal) intra-argument cue words. Dataset cartography can thus serve a diagnostic of cue-driven easy-to-learn cases, which need to be balanced out to ensure the robustness of IDRR learning.

Anthology ID:: 2026.codi-1.8
Volume:: Proceedings of the 2nd Joint Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences and Computational Models of Reference, Anaphora and Coreference (CODI-CRAC 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Chloé Braud, Christian Hardmeier, Maciej Ogrodniczuk, Sharid Loaiciga, Amir Zeldes, Michal Novák, Chuyuan Li, Michael Strube, Junyi Jessy Li
Venues:: CODI | CRAC | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 53–64
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.codi-1.8/
DOI:
Bibkey:
Cite (ACL):: Daniil Ignatev, Denis Paperno, and Massimo Poesio. 2026. Dataset Cartography for Implicit Discourse Relation Recognition: Promises and Pitfalls. In Proceedings of the 2nd Joint Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences and Computational Models of Reference, Anaphora and Coreference (CODI-CRAC 2026), pages 53–64, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Dataset Cartography for Implicit Discourse Relation Recognition: Promises and Pitfalls (Ignatev et al., CODI-CRAC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.codi-1.8.pdf
Supplementarymaterial:: 2026.codi-1.8.SupplementaryMaterial.zip

PDF Cite Search Supplementarymaterial Fix data