HILDA: A Discourse Parser Using Support Vector Machine Classification

Hugo Hernault, Helmut Prendinger, David A. du Verle, Mitsuru Ishizuka


Abstract
Discourse structures have a central role in several computational tasks, such as question-answering or dialogue generation. In particular, the framework of the Rhetorical Structure Theory (RST) offers a sound formalism for hierarchical text organization. In this article, we present HILDA, an implemented discourse parser based on RST and Support Vector Machine (SVM) classification. SVM classifiers are trained and applied to discourse segmentation and relation labeling. By combining labeling with a greedy bottom-up tree building approach, we are able to create accurate discourse trees in linear time complexity. Importantly, our parser can parse entire texts, whereas the publicly available parser SPADE (Soricut and Marcu 2003) is limited to sentence level analysis. HILDA outperforms other discourse parsers for tree structure construction and discourse relation labeling. For the discourse parsing task, our system reaches 78.3% of the performance level of human annotators. Compared to a state-of-the-art rule-based discourse parser, our system achieves a performance increase of 11.6%.
Anthology ID:
2010.dnd-1.1
Volume:
Dialogue Discourse Volume 1
Month:
Year:
2010
Address:
Editors:
Jonathan Ginzburg, Massimo Poesio, Tim Paek
Venue:
DND
SIG:
SIGDIAL
Publisher:
Note:
Pages:
1–33
Language:
URL:
https://preview.aclanthology.org/ingest-dnd/2010.dnd-1.1/
DOI:
10.5087/dad.2010.003
Bibkey:
Cite (ACL):
Hugo Hernault, Helmut Prendinger, David A. du Verle, and Mitsuru Ishizuka. 2010. HILDA: A Discourse Parser Using Support Vector Machine Classification. Dialogue & Discourse, 1:1–33.
Cite (Informal):
HILDA: A Discourse Parser Using Support Vector Machine Classification (Hernault et al., DND 2010)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-dnd/2010.dnd-1.1.pdf