Abstract
Tracking state-of-the-art (SOTA) results in machine learning studies is challenging due to high publication volume. Existing methods for creating leaderboards in scientific documents require significant human supervision or rely on scarcely available LaTeX source files. We propose Table Entity LINker (TELIN), a framework which extracts (task, model, dataset, metric) quadruples from collections of scientific publications in PDF format. TELIN identifies scientific named entities, constructs a knowledge base, and leverages human feedback to iteratively refine automatic extractions. TELIN identifies and prioritizes uncertain and impactful entities for human review to create a cascade effect for leaderboard completion. We show that TELIN is competitive with the SOTA but requires much less human annotation.- Anthology ID:
- 2022.wiesp-1.3
- Volume:
- Proceedings of the first Workshop on Information Extraction from Scientific Publications
- Month:
- November
- Year:
- 2022
- Address:
- Online
- Venue:
- WIESP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 20–25
- Language:
- URL:
- https://aclanthology.org/2022.wiesp-1.3
- DOI:
- Cite (ACL):
- Sean Yang, Chris Tensmeyer, and Curtis Wigington. 2022. TELIN: Table Entity LINker for Extracting Leaderboards from Machine Learning Publications. In Proceedings of the first Workshop on Information Extraction from Scientific Publications, pages 20–25, Online. Association for Computational Linguistics.
- Cite (Informal):
- TELIN: Table Entity LINker for Extracting Leaderboards from Machine Learning Publications (Yang et al., WIESP 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.wiesp-1.3.pdf