Abstract
This work introduces fact salience: The task of generating a machine-readable representation of the most prominent information in a text document as a set of facts. We also present SalIE, the first fact salience system. SalIE is unsupervised and knowledge agnostic, based on open information extraction to detect facts in natural language text, PageRank to determine their relevance, and clustering to promote diversity. We compare SalIE with several baselines (including positional, standard for saliency tasks), and in an extrinsic evaluation, with state-of-the-art automatic text summarizers. SalIE outperforms baselines and text summarizers showing that facts are an effective way to compress information.- Anthology ID:
- D18-1129
- Volume:
- Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
- Month:
- October-November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1043–1048
- Language:
- URL:
- https://aclanthology.org/D18-1129
- DOI:
- 10.18653/v1/D18-1129
- Cite (ACL):
- Marco Ponza, Luciano Del Corro, and Gerhard Weikum. 2018. Facts That Matter. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1043–1048, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Facts That Matter (Ponza et al., EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/D18-1129.pdf
- Code
- mponza/SalIE
- Data
- New York Times Annotated Corpus