Boosting Entity Linking Performance by Leveraging Unlabeled Documents

Phong Le; Ivan Titov

doi:10.18653/v1/P19-1187

Boosting Entity Linking Performance by Leveraging Unlabeled Documents

Abstract

Modern entity linking systems rely on large collections of documents specifically annotated for the task (e.g., AIDA CoNLL). In contrast, we propose an approach which exploits only naturally occurring information: unlabeled documents and Wikipedia. Our approach consists of two stages. First, we construct a high recall list of candidate entities for each mention in an unlabeled document. Second, we use the candidate lists as weak supervision to constrain our document-level entity linking model. The model treats entities as latent variables and, when estimated on a collection of unlabelled texts, learns to choose entities relying both on local context of each mention and on coherence with other entities in the document. The resulting approach rivals fully-supervised state-of-the-art systems on standard test sets. It also approaches their performance in the very challenging setting: when tested on a test set sampled from the data used to estimate the supervised systems. By comparing to Wikipedia-only training of our model, we demonstrate that modeling unlabeled documents is beneficial.

Anthology ID:: P19-1187
Volume:: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:: July
Year:: 2019
Address:: Florence, Italy
Editors:: Anna Korhonen, David Traum, Lluís Màrquez
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1935–1945
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/P19-1187/
DOI:: 10.18653/v1/P19-1187
Bibkey:
Cite (ACL):: Phong Le and Ivan Titov. 2019. Boosting Entity Linking Performance by Leveraging Unlabeled Documents. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1935–1945, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: Boosting Entity Linking Performance by Leveraging Unlabeled Documents (Le & Titov, ACL 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/P19-1187.pdf
Video:: https://preview.aclanthology.org/fix-sig-urls/P19-1187.mp4
Code: lephong/wnel
Data: AIDA CoNLL-YAGO, CoNLL

PDF Cite Search Code Video Fix data