AI for Data Ingestion into IPAC Archives

Nicholas Susemiehl, Joseph Mazzarella


Abstract
The astronomical data archives at IPAC, including the NASA Extragalactic Database (NED) and NASA Exoplanet Archive (NEA), have served as repositories for data published in the literature for decades. Throughout this time, extracting data from journal articles has remained a challenging task and future large data releases will exasperate this problem. We seek to accelerate the rate at which data can be extracted from journal articles and reformatted into database load files by leveraging recent advances in natural language processing enabled by AI. We are developing a new suite of tools to semi-automate information retrieval from scientific journal articles. Manual methods to extract and prepare data, which can take hours for some articles, are being replaced with AI-powered tools that can compress the task to minutes. A combination of AI and non-AI methods, along with human supervision, can substantially accelerate archive data ingestion. Challenges remain for improving accuracy, capturing data in external files, and flagging issues such as mislabeled object names and missing metadata.
Anthology ID:
2025.wasp-main.10
Volume:
Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications
Month:
December
Year:
2025
Address:
Mumbai, India and virtual
Editors:
Alberto Accomazzi, Tirthankar Ghosal, Felix Grezes, Kelly Lockhart
Venues:
WASP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
87–93
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wasp-main.10/
DOI:
Bibkey:
Cite (ACL):
Nicholas Susemiehl and Joseph Mazzarella. 2025. AI for Data Ingestion into IPAC Archives. In Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications, pages 87–93, Mumbai, India and virtual. Association for Computational Linguistics.
Cite (Informal):
AI for Data Ingestion into IPAC Archives (Susemiehl & Mazzarella, WASP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wasp-main.10.pdf