Xin Peng
2024
ZSEE: A Dataset based on Zeolite Synthesis Event Extraction for Automated Synthesis Platform
Song He
|
Xin Peng
|
Yihan Cai
|
Xin Li
|
Zhiqing Yuan
|
WenLi Du
|
Weimin Yang
Findings of the Association for Computational Linguistics: NAACL 2024
Automated synthesis of zeolite, one of the most important catalysts in chemical industries, holds great significance for attaining economic and environmental benefits. Structural synthesis data extracted through NLP technologies from zeolite experimental procedures can significantly expedite automated synthesis owing to its machine readability. However, the utilization of NLP technologies in information extraction of zeolite synthesis remains restricted due to the lack of annotated datasets. In this paper, we formulate an event extraction task to mine structural synthesis actions from experimental narratives for modular automated synthesis. Furthermore, we introduce ZSEE, a novel dataset containing fine-grained event annotations of zeolite synthesis actions. Our dataset features 16 event types and 13 argument roles which cover all the experimental operational steps of zeolite synthesis. We explore current state-of-the-art event extraction methods on ZSEE, perform error analysis based on the experimental results, and summarize the challenges and corresponding research directions to further facilitate the automated synthesis of zeolites. The code is publicly available at https://github.com/Hi-0317/ZSEE.
Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation
Tong Su
|
Xin Peng
|
Sarubi Thillainathan
|
David Guzmán
|
Surangika Ranathunga
|
En-Shiun Lee
Findings of the Association for Computational Linguistics: NAACL 2024
Parameter-efficient fine-tuning (PEFT) methods are increasingly vital in adapting large-scale pre-trained language models for diverse tasks, offering a balance between adaptability and computational efficiency. They are important in Low-Resource Language (LRL) Neural Machine Translation (NMT) to enhance translation accuracy with minimal resources. However, their practical effectiveness varies significantly across different languages. We conducted comprehensive empirical experiments with varying LRL domains and sizes to evaluate the performance of 8 PEFT methods with in total of 15 architectures using the SacreBLEU score. We showed that 6 PEFT architectures outperform the baseline for both in-domain and out-domain tests and the Houlsby+Inversion adapter has the best performance overall, proving the effectiveness of PEFT methods.