BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language Models

Chuyuan Li; Giuseppe Carenini

BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language Models

Abstract

We introduce BeDiscovER (Benchmark of Discourse Understanding in the Era of Reasoning Language Models), an up-to-date, comprehensive suite for evaluating the discourse-level knowledge of modern LLMs. BeDiscovER compiles 5 publicly available discourse tasks across discourse lexicon, (multi-)sentential, and documental levels, with in total 52 individual datasets. It covers both extensively studied tasks such as discourse parsing and temporal relation extraction, as well as some novel challenges such as discourse particle disambiguation (e.g., just), and also aggregates a shared-task on Discourse Relation Parsing and Treebanking for multilingual and multi-framework discourse relation classification. We evaluate open-source LLMs: Qwen3 series, DeepSeek-R1, and frontier reasoning model GPT-5-mini on BeDiscovER, and find that state-of-the-art models exhibits strong performance in arithmetic aspect of temporal reasoning, but they struggle with long-dependency reasoning and some subtle semantic and discourse phenomena, such as rhetorical relation classification.

Anthology ID:: 2026.eacl-long.207
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4417–4479
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.207/
DOI:
Bibkey:
Cite (ACL):: Chuyuan Li and Giuseppe Carenini. 2026. BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language Models. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4417–4479, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language Models (Li & Carenini, EACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.207.pdf

PDF Cite Search Fix data