Divide and Extract – Disentangling Clause Splitting and Proposition Extraction

Darina Gold, Torsten Zesch


Abstract
Proposition extraction from sentences is an important task for information extraction systems Evaluation of such systems usually conflates two aspects: splitting complex sentences into clauses and the extraction of propositions. It is thus difficult to independently determine the quality of the proposition extraction step. We create a manually annotated proposition dataset from sentences taken from restaurant reviews that distinguishes between clauses that need to be split and those that do not. The resulting proposition evaluation dataset allows us to independently compare the performance of proposition extraction systems on simple and complex clauses. Although performance drastically drops on more complex sentences, we show that the same systems perform best on both simple and complex clauses. Furthermore, we show that specific kinds of subordinate clauses pose difficulties to most systems.
Anthology ID:
R19-1047
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
399–408
Language:
URL:
https://aclanthology.org/R19-1047
DOI:
10.26615/978-954-452-056-4_047
Bibkey:
Cite (ACL):
Darina Gold and Torsten Zesch. 2019. Divide and Extract – Disentangling Clause Splitting and Proposition Extraction. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 399–408, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Divide and Extract – Disentangling Clause Splitting and Proposition Extraction (Gold & Zesch, RANLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/R19-1047.pdf