Do sentence embeddings capture discourse properties of sentences from Scientific Abstracts ?

Laurine Huber, Chaker Memmadi, Mathilde Dargnat, Yannick Toussaint


Abstract
We introduce four tasks designed to determine which sentence encoders best capture discourse properties of sentences from scientific abstracts, namely coherence and cohesion between clauses of a sentence, and discourse relations within sentences. We show that even if contextual encoders such as BERT or SciBERT encodes the coherence in discourse units, they do not help to predict three discourse relations commonly used in scientific abstracts. We discuss what these results underline, namely that these discourse relations are based on particular phrasing that allow non-contextual encoders to perform well.
Anthology ID:
2020.codi-1.9
Volume:
Proceedings of the First Workshop on Computational Approaches to Discourse
Month:
November
Year:
2020
Address:
Online
Editors:
Chloé Braud, Christian Hardmeier, Junyi Jessy Li, Annie Louis, Michael Strube
Venue:
CODI
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
86–95
Language:
URL:
https://aclanthology.org/2020.codi-1.9
DOI:
10.18653/v1/2020.codi-1.9
Bibkey:
Cite (ACL):
Laurine Huber, Chaker Memmadi, Mathilde Dargnat, and Yannick Toussaint. 2020. Do sentence embeddings capture discourse properties of sentences from Scientific Abstracts ?. In Proceedings of the First Workshop on Computational Approaches to Discourse, pages 86–95, Online. Association for Computational Linguistics.
Cite (Informal):
Do sentence embeddings capture discourse properties of sentences from Scientific Abstracts ? (Huber et al., CODI 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2020.codi-1.9.pdf
Video:
 https://slideslive.com/38939694
Data
SentEval