Linea Flansmose Mikkelsen
2022
DDisCo: A Discourse Coherence Dataset for Danish
Linea Flansmose Mikkelsen
|
Oliver Kinch
|
Anders Jess Pedersen
|
Ophélie Lacroix
Proceedings of the Thirteenth Language Resources and Evaluation Conference
To date, there has been no resource for studying discourse coherence on real-world Danish texts. Discourse coherence has mostly been approached with the assumption that incoherent texts can be represented by coherent texts in which sentences have been shuffled. However, incoherent real-world texts rarely resemble that. We thus present DDisCo, a dataset including text from the Danish Wikipedia and Reddit annotated for discourse coherence. We choose to annotate real-world texts instead of relying on artificially incoherent text for training and testing models. Then, we evaluate the performance of several methods, including neural networks, on the dataset.