A Transformer Based Approach towards Identification of Discourse Unit Segments and Connectives

Sahil Bakshi, Dipti Sharma


Abstract
Discourse parsing, which involves understanding the structure, information flow, and modeling the coherence of a given text, is an important task in natural language processing. It forms the basis of several natural language processing tasks such as question-answering, text summarization, and sentiment analysis. Discourse unit segmentation is one of the fundamental tasks in discourse parsing and refers to identifying the elementary units of text that combine to form a coherent text. In this paper, we present a transformer based approach towards the automated identification of discourse unit segments and connectives. Early approaches towards segmentation relied on rule-based systems using POS tags and other syntactic information to identify discourse segments. Recently, transformer based neural systems have shown promising results in this domain. Our system, SegFormers, employs this transformer based approach to perform multilingual discourse segmentation and connective identification across 16 datasets encompassing 11 languages and 3 different annotation frameworks. We evaluate the system based on F1 scores for both tasks, with the best system reporting the highest F1 score of 97.02% for the treebanked English RST-DT dataset.
Anthology ID:
2021.disrpt-1.2
Volume:
Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Amir Zeldes, Yang Janet Liu, Mikel Iruskieta, Philippe Muller, Chloé Braud, Sonia Badene
Venue:
DISRPT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13–21
Language:
URL:
https://aclanthology.org/2021.disrpt-1.2
DOI:
10.18653/v1/2021.disrpt-1.2
Bibkey:
Cite (ACL):
Sahil Bakshi and Dipti Sharma. 2021. A Transformer Based Approach towards Identification of Discourse Unit Segments and Connectives. In Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021), pages 13–21, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
A Transformer Based Approach towards Identification of Discourse Unit Segments and Connectives (Bakshi & Sharma, DISRPT 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2021.disrpt-1.2.pdf