Abstract
Article comprehension is an important challenge in natural language processing with many applications such as article generation or image-to-article retrieval. Prior work typically encodes all tokens in articles uniformly using pretrained language models. However, in many applications, such as understanding news stories, these articles are based on real-world events and may reference many named entities that are difficult to accurately recognize and predict by language models. To address this challenge, we propose an ENtity-aware article GeneratIoN and rEtrieval (ENGINE) framework, to explicitly incorporate named entities into language models. ENGINE has two main components: a named-entity extraction module to extract named entities from both metadata and embedded images associated with articles, and an entity-aware mechanism that enhances the model’s ability to recognize and predict entity names. We conducted experiments on three public datasets: GoodNews, VisualNews, and WikiText, where our results demonstrate that our model can boost both article generation and article retrieval performance, with a 4-5 perplexity improvement in article generation and a 3-4% boost in recall@1 in article retrieval. We release our implementation at [this http URL](https://github.com/Zhongping-Zhang/ENGINE).- Anthology ID:
- 2023.findings-emnlp.581
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 8684–8704
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.581
- DOI:
- 10.18653/v1/2023.findings-emnlp.581
- Cite (ACL):
- Zhongping Zhang, Yiwen Gu, and Bryan Plummer. 2023. Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 8684–8704, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval (Zhang et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2023.findings-emnlp.581.pdf