Facts That Matter

Marco Ponza, Luciano Del Corro, Gerhard Weikum


Abstract
This work introduces fact salience: The task of generating a machine-readable representation of the most prominent information in a text document as a set of facts. We also present SalIE, the first fact salience system. SalIE is unsupervised and knowledge agnostic, based on open information extraction to detect facts in natural language text, PageRank to determine their relevance, and clustering to promote diversity. We compare SalIE with several baselines (including positional, standard for saliency tasks), and in an extrinsic evaluation, with state-of-the-art automatic text summarizers. SalIE outperforms baselines and text summarizers showing that facts are an effective way to compress information.
Anthology ID:
D18-1129
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Editors:
Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1043–1048
Language:
URL:
https://aclanthology.org/D18-1129
DOI:
10.18653/v1/D18-1129
Bibkey:
Cite (ACL):
Marco Ponza, Luciano Del Corro, and Gerhard Weikum. 2018. Facts That Matter. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1043–1048, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Facts That Matter (Ponza et al., EMNLP 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/D18-1129.pdf
Code
 mponza/SalIE
Data
New York Times Annotated Corpus