ZSEE: A Dataset based on Zeolite Synthesis Event Extraction for Automated Synthesis Platform

Song He, Xin Peng, Yihan Cai, Xin Li, Zhiqing Yuan, WenLi Du, Weimin Yang


Abstract
Automated synthesis of zeolite, one of the most important catalysts in chemical industries, holds great significance for attaining economic and environmental benefits. Structural synthesis data extracted through NLP technologies from zeolite experimental procedures can significantly expedite automated synthesis owing to its machine readability. However, the utilization of NLP technologies in information extraction of zeolite synthesis remains restricted due to the lack of annotated datasets. In this paper, we formulate an event extraction task to mine structural synthesis actions from experimental narratives for modular automated synthesis. Furthermore, we introduce ZSEE, a novel dataset containing fine-grained event annotations of zeolite synthesis actions. Our dataset features 16 event types and 13 argument roles which cover all the experimental operational steps of zeolite synthesis. We explore current state-of-the-art event extraction methods on ZSEE, perform error analysis based on the experimental results, and summarize the challenges and corresponding research directions to further facilitate the automated synthesis of zeolites. The code is publicly available at https://github.com/Hi-0317/ZSEE.
Anthology ID:
2024.findings-naacl.116
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1791–1808
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-naacl.116/
DOI:
10.18653/v1/2024.findings-naacl.116
Bibkey:
Cite (ACL):
Song He, Xin Peng, Yihan Cai, Xin Li, Zhiqing Yuan, WenLi Du, and Weimin Yang. 2024. ZSEE: A Dataset based on Zeolite Synthesis Event Extraction for Automated Synthesis Platform. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 1791–1808, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
ZSEE: A Dataset based on Zeolite Synthesis Event Extraction for Automated Synthesis Platform (He et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-naacl.116.pdf