Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval

Zhongping Zhang, Yiwen Gu, Bryan Plummer


Abstract
Article comprehension is an important challenge in natural language processing with many applications such as article generation or image-to-article retrieval. Prior work typically encodes all tokens in articles uniformly using pretrained language models. However, in many applications, such as understanding news stories, these articles are based on real-world events and may reference many named entities that are difficult to accurately recognize and predict by language models. To address this challenge, we propose an ENtity-aware article GeneratIoN and rEtrieval (ENGINE) framework, to explicitly incorporate named entities into language models. ENGINE has two main components: a named-entity extraction module to extract named entities from both metadata and embedded images associated with articles, and an entity-aware mechanism that enhances the model’s ability to recognize and predict entity names. We conducted experiments on three public datasets: GoodNews, VisualNews, and WikiText, where our results demonstrate that our model can boost both article generation and article retrieval performance, with a 4-5 perplexity improvement in article generation and a 3-4% boost in recall@1 in article retrieval. We release our implementation at [this http URL](https://github.com/Zhongping-Zhang/ENGINE).
Anthology ID:
2023.findings-emnlp.581
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8684–8704
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.581
DOI:
10.18653/v1/2023.findings-emnlp.581
Bibkey:
Cite (ACL):
Zhongping Zhang, Yiwen Gu, and Bryan Plummer. 2023. Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 8684–8704, Singapore. Association for Computational Linguistics.
Cite (Informal):
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval (Zhang et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2023.findings-emnlp.581.pdf