Feng Zhu
2025
ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data
Yu Zhang
|
Ruijie Yu
|
Jidong Tian
|
Feng Zhu
|
Jiapeng Liu
|
Xiaokang Yang
|
Yaohui Jin
|
Yanyan Xu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
With the increasing interest in robotic synthesis in the context of organic chemistry, the automated extraction of chemical procedures from literature is critical. However, this task remains challenging due to the inherent ambiguity of chemical language and the high cost of human annotation required for developing reliable computer-aided extraction protocols. Here, we present ChemActor, a fully fine-tuned large language model (LLM), as a chemical executor to convert between unstructured experimental procedures and structured action sequences. We propose a sequential LLM-generated data framework to address the challenges of insufficient and low-quality annotated data. This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input. Additionally, we introduce a novel multi-round LLMs circle review metric, which reflects the model’s advanced understanding of chemical experimental procedures. Extensive experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor, augmented by LLM-generated data, achieves state-of-the-art performance, outperforming the baseline model by 10%. The code is available at: https://github.com/Zhanghahah/ChemActor.
2019
AUTOHOME-ORCA at SemEval-2019 Task 8: Application of BERT for Fact-Checking in Community Forums
Zhengwei Lv
|
Duoxing Liu
|
Haifeng Sun
|
Xiao Liang
|
Tao Lei
|
Zhizhong Shi
|
Feng Zhu
|
Lei Yang
Proceedings of the 13th International Workshop on Semantic Evaluation
Fact checking is an important task for maintaining high quality posts and improving user experience in Community Question Answering forums. Therefore, the SemEval-2019 task 8 is aimed to identify factual question (subtask A) and detect true factual information from corresponding answers (subtask B). In order to address this task, we propose a system based on the BERT model with meta information of questions. For the subtask A, the outputs of fine-tuned BERT classification model are combined with the feature of length of questions to boost the performance. For the subtask B, the predictions of several variants of BERT model encoding the meta information are combined to create an ensemble model. Our system achieved competitive results with an accuracy of 0.82 in the subtask A and 0.83 in the subtask B. The experimental results validate the effectiveness of our system.
Search
Fix author
Co-authors
- Yaohui Jin 1
- Tao Lei (雷涛) 1
- Xiao Liang (梁霄) 1
- Jiapeng Liu 1
- Duoxing Liu (刘多星) 1
- show all...