Lihao Wang
2026
R3: End-to-End Reasoning-based Planning for Multi-step Retrosynthesis via Reinforcement Learning
YiFei Wang | Qizhi Pei | Jiangtao Feng | Yuntian Shi | Yi Duan | Lihao Wang | Lei Bai | Lijun Wu | Wei-Ying Ma | Hao Zhou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
YiFei Wang | Qizhi Pei | Jiangtao Feng | Yuntian Shi | Yi Duan | Lihao Wang | Lei Bai | Lijun Wu | Wei-Ying Ma | Hao Zhou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multi-step retrosynthetic planning is a fundamental challenge in organic chemistry, traditionally modeled as a combinatorial search problem guided by single-step prediction models. However, this search-centric paradigm often disconnects from the explicit chemical reasoning processes employed by human experts. In this paper, we propose R3 (Reinforced Reasoning Retrosynthesis), a novel framework that reformulates this task as end-to-end generative reasoning. Instead of traversing a search tree, R3 simulates the problem-solving logic of chemists to directly generate complete synthetic pathways. To achieve this, we initialize the model with domain knowledge and employ end-to-end Reinforcement Learning (RL) to optimize the entire planning policy. Experimental results on Retrobench show that R3 achieves a state-of-the-art Top-1 accuracy of 43.7%, demonstrating that generative reasoning offers a superior alternative to traditional search algorithms in solving complex retrosynthetic problems.
2020
Improving Grammatical Error Correction Models with Purpose-Built Adversarial Examples
Lihao Wang | Xiaoqing Zheng
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Lihao Wang | Xiaoqing Zheng
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
A sequence-to-sequence (seq2seq) learning with neural networks empirically shows to be an effective framework for grammatical error correction (GEC), which takes a sentence with errors as input and outputs the corrected one. However, the performance of GEC models with the seq2seq framework heavily relies on the size and quality of the corpus on hand. We propose a method inspired by adversarial training to generate more meaningful and valuable training examples by continually identifying the weak spots of a model, and to enhance the model by gradually adding the generated adversarial examples to the training set. Extensive experimental results show that such adversarial training can improve both the generalization and robustness of GEC models.