Abstract
We propose a global entity disambiguation (ED) model based on BERT. To capture global contextual information for ED, our model treats not only words but also entities as input tokens, and solves the task by sequentially resolving mentions to their referent entities and using resolved entities as inputs at each step. We train the model using a large entity-annotated corpus obtained from Wikipedia. We achieve new state-of-the-art results on five standard ED datasets: AIDA-CoNLL, MSNBC, AQUAINT, ACE2004, and WNED-WIKI. The source code and model checkpoint are available at https://github.com/studio-ousia/luke.- Anthology ID:
- 2022.naacl-main.238
- Volume:
- Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, United States
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3264–3271
- Language:
- URL:
- https://aclanthology.org/2022.naacl-main.238
- DOI:
- 10.18653/v1/2022.naacl-main.238
- Cite (ACL):
- Ikuya Yamada, Koki Washio, Hiroyuki Shindo, and Yuji Matsumoto. 2022. Global Entity Disambiguation with BERT. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3264–3271, Seattle, United States. Association for Computational Linguistics.
- Cite (Informal):
- Global Entity Disambiguation with BERT (Yamada et al., NAACL 2022)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2022.naacl-main.238.pdf
- Code
- studio-ousia/luke
- Data
- ACE 2004, AIDA CoNLL-YAGO, AQUAINT