S2abEL: A Dataset for Entity Linking from Scientific Tables

Yuze Lou, Bailey Kuehl, Erin Bransom, Sergey Feldman, Aakanksha Naik, Doug Downey


Abstract
Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications. When applied to tables in scientific papers, EL is a step toward large-scale scientific knowledge bases that could enable advanced scientific question answering and analytics. We present the first dataset for EL in scientific tables. EL for scientific tables is especially challenging because scientific knowledge bases can be very incomplete, and disambiguating table mentions typically requires understanding the paper’s text in addition to the table. Our dataset, Scientific Table Entity Linking (S2abEL), focuses on EL in machine learning results tables and includes hand-labeled cell types, attributed sources, and entity links from the PaperswithCode taxonomy for 8,429 cells from 732 tables. We introduce a neural baseline method designed for EL on scientific tables containing many out-of-knowledge-base mentions, and show that it significantly outperforms a state-of-the-art generic table EL method. The best baselines fall below human performance, and our analysis highlights avenues for improvement.
Anthology ID:
2023.emnlp-main.186
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3089–3101
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.emnlp-main.186/
DOI:
10.18653/v1/2023.emnlp-main.186
Bibkey:
Cite (ACL):
Yuze Lou, Bailey Kuehl, Erin Bransom, Sergey Feldman, Aakanksha Naik, and Doug Downey. 2023. S2abEL: A Dataset for Entity Linking from Scientific Tables. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3089–3101, Singapore. Association for Computational Linguistics.
Cite (Informal):
S2abEL: A Dataset for Entity Linking from Scientific Tables (Lou et al., EMNLP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.emnlp-main.186.pdf
Video:
 https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.emnlp-main.186.mp4