@inproceedings{susemiehl-mazzarella-2025-ai,
title = "{AI} for Data Ingestion into {IPAC} Archives",
author = "Susemiehl, Nicholas and
Mazzarella, Joseph",
editor = "Accomazzi, Alberto and
Ghosal, Tirthankar and
Grezes, Felix and
Lockhart, Kelly",
booktitle = "Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications",
month = dec,
year = "2025",
address = "Mumbai, India and virtual",
publisher = "Association for Computational Linguistics",
url = "https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wasp-main.10/",
pages = "87--93",
ISBN = "979-8-89176-310-4",
abstract = "The astronomical data archives at IPAC, including the NASA Extragalactic Database (NED) and NASA Exoplanet Archive (NEA), have served as repositories for data published in the literature for decades. Throughout this time, extracting data from journal articles has remained a challenging task and future large data releases will exasperate this problem. We seek to accelerate the rate at which data can be extracted from journal articles and reformatted into database load files by leveraging recent advances in natural language processing enabled by AI. We are developing a new suite of tools to semi-automate information retrieval from scientific journal articles. Manual methods to extract and prepare data, which can take hours for some articles, are being replaced with AI-powered tools that can compress the task to minutes. A combination of AI and non-AI methods, along with human supervision, can substantially accelerate archive data ingestion. Challenges remain for improving accuracy, capturing data in external files, and flagging issues such as mislabeled object names and missing metadata."
}Markdown (Informal)
[AI for Data Ingestion into IPAC Archives](https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wasp-main.10/) (Susemiehl & Mazzarella, WASP 2025)
ACL
- Nicholas Susemiehl and Joseph Mazzarella. 2025. AI for Data Ingestion into IPAC Archives. In Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications, pages 87–93, Mumbai, India and virtual. Association for Computational Linguistics.