A Unified Framework for N-ary Property Information Extraction in Materials Science

Van-Thuy Phi, Yuji Matsumoto


Abstract
This paper presents a unified framework for extracting n-ary property information from materials science literature, addressing the critical challenge of capturing complex relationships that often span multiple sentences. We introduce three complementary approaches: RE-Composition, which transforms binary relations into n-ary structures; Direct EAE, which models polymer properties as events with multiple arguments; and LLM-Guided Assembly, which leverages high-confidence entity and relation outputs to guide structured extraction. Our framework is built upon two novel resources: MatSciNERE, a comprehensive corpus for materials science entities and relations, and PolyEE, a specialized corpus for polymer property events. Through strategic synthetic data generation for both NER and EAE tasks, we achieve significant performance improvements (up to 5.34 F1 points). Experiments demonstrate that our combined approaches outperform any single method, with the LLM-guided approach achieving the highest F1 score (71.53%). The framework enables more comprehensive knowledge extraction from scientific literature, supporting materials discovery and database curation applications. We plan to release our resources and trained models to the research community.
Anthology ID:
2025.findings-emnlp.128
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2369–2388
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.128/
DOI:
10.18653/v1/2025.findings-emnlp.128
Bibkey:
Cite (ACL):
Van-Thuy Phi and Yuji Matsumoto. 2025. A Unified Framework for N-ary Property Information Extraction in Materials Science. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 2369–2388, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
A Unified Framework for N-ary Property Information Extraction in Materials Science (Phi & Matsumoto, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.128.pdf
Checklist:
 2025.findings-emnlp.128.checklist.pdf