KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness

Yichuan Li, Jialong Han, Kyumin Lee, Chengyuan Ma, Benjamin Yao, Xiaohu Liu


Abstract
In recent years, Pre-trained Language Models (PLMs) have shown their superiority by pre-training on unstructured text corpus and then fine-tuning on downstream tasks. On entity-rich textual resources like Wikipedia, Knowledge-Enhanced PLMs (KEPLMs) incorporate the interactions between tokens and mentioned entities in pre-training, and are thus more effective on entity-centric tasks such as entity linking and relation classification. Although exploiting Wikipedia’s rich structures to some extent, conventional KEPLMs still neglect a unique layout of the corpus where each Wikipedia page is around a topic entity (identified by the page URL and shown in the page title). In this paper, we demonstrate that KEPLMs without incorporating the topic entities will lead to insufficient entity interaction and biased (relation) word semantics. We thus propose KEPLET, a novel Knowledge-Énhanced Pre-trained LanguagE model with Topic entity awareness. In an end-to-end manner, KEPLET identifies where to add the topic entity’s information in a Wikipedia sentence, fuses such information into token and mentioned entities representations, and supervises the network learning, through which it takes topic entities back into consideration. Experiments demonstrated the generality and superiority of KEPLET which was applied to two representative KEPLMs, achieving significant improvements on four entity-centric tasks.
Anthology ID:
2023.findings-emnlp.458
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6864–6876
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.458
DOI:
10.18653/v1/2023.findings-emnlp.458
Bibkey:
Cite (ACL):
Yichuan Li, Jialong Han, Kyumin Lee, Chengyuan Ma, Benjamin Yao, and Xiaohu Liu. 2023. KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6864–6876, Singapore. Association for Computational Linguistics.
Cite (Informal):
KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness (Li et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2023.findings-emnlp.458.pdf