Abstract
A frequent challenge in applications that use entities extracted from text documents is selecting the most salient entities when only a small number can be used by the application (e.g., displayed to a user). Solving this challenge is particularly difficult in the setting of extremely short documents, such as the response from a digital assistant, where traditional signals of salience such as position and frequency are less likely to be useful. In this paper, we propose a lightweight and data-efficient approach for entity salience detection on short text documents. Our experiments show that our approach achieves competitive performance with respect to complex state-of-the-art models, such as GPT-4, at a significant advantage in latency and cost. In limited data settings, we show that a semi-supervised fine-tuning process can improve performance further. Furthermore, we introduce a novel human-labeled dataset for evaluating entity salience on short question-answer pair documents.- Anthology ID:
- 2024.emnlp-industry.5
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, US
- Editors:
- Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 50–64
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.emnlp-industry.5/
- DOI:
- 10.18653/v1/2024.emnlp-industry.5
- Cite (ACL):
- Benjamin Bullough, Harrison Lundberg, Chen Hu, and Weihang Xiao. 2024. Predicting Entity Salience in Extremely Short Documents. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 50–64, Miami, Florida, US. Association for Computational Linguistics.
- Cite (Informal):
- Predicting Entity Salience in Extremely Short Documents (Bullough et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.emnlp-industry.5.pdf