Discourse Coherence in the Wild: A Dataset, Evaluation and Methods

Alice Lai, Joel Tetreault


Abstract
To date there has been very little work on assessing discourse coherence methods on real-world data. To address this, we present a new corpus of real-world texts (GCDC) as well as the first large-scale evaluation of leading discourse coherence algorithms. We show that neural models, including two that we introduce here (SentAvg and ParSeq), tend to perform best. We analyze these performance differences and discuss patterns we observed in low coherence texts in four domains.
Anthology ID:
W18-5023
Volume:
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Kazunori Komatani, Diane Litman, Kai Yu, Alex Papangelis, Lawrence Cavedon, Mikio Nakano
Venue:
SIGDIAL
SIG:
SIGDIAL
Publisher:
Association for Computational Linguistics
Note:
Pages:
214–223
Language:
URL:
https://aclanthology.org/W18-5023
DOI:
10.18653/v1/W18-5023
Bibkey:
Cite (ACL):
Alice Lai and Joel Tetreault. 2018. Discourse Coherence in the Wild: A Dataset, Evaluation and Methods. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, pages 214–223, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Discourse Coherence in the Wild: A Dataset, Evaluation and Methods (Lai & Tetreault, SIGDIAL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/W18-5023.pdf
Attachment:
 W18-5023.Attachment.pdf
Code
 aylai/GCDC-corpus
Data
GCDC