Sihang Li
2026
ProtoCycle: Reflective Tool-Augmented Planning for Text-Guided Protein Design
Yutang Ge | Guojiang Zhao | Sihang Li | Zheng Cheng | Zifeng Zhao | Hanchen Xia | Guolin Ke | Linfeng Zhang | Zhifeng Gao | Yu Guang Wang
Findings of the Association for Computational Linguistics: ACL 2026
Yutang Ge | Guojiang Zhao | Sihang Li | Zheng Cheng | Zifeng Zhao | Hanchen Xia | Guolin Ke | Linfeng Zhang | Zhifeng Gao | Yu Guang Wang
Findings of the Association for Computational Linguistics: ACL 2026
Designing proteins that satisfy natural language functional requirements is a central goal in protein engineering. A straightforward baseline is to fine-tune generic instruction-tuned LLMs as direct text-to-sequence generators, but this is data- and compute-hungry. With limited supervision, LLMs can produce coherent plans in text yet fail to reliably realize them as sequences. This plan–execute gap motivates ProtoCycle, an agentic framework for protein design that uses LLMs primarily to drive a multi-round, feedback-driven decision cycle. ProtoCycle couples an LLM planner with a lightweight tool environment designed to emulate the iterative workflow of human protein engineers and uses LLM-driven reflection on tool feedback to revise plans. Trained with supervised trajectories and online reinforcement learning, ProtoCycle achieves strong language alignment while maintaining competitive foldability, and ablations show that reflection substantially improves sequence quality.
Chronos: Learning Temporal Dynamics of Reasoning Chains for Test-Time Scaling
Kai Zhang | Jiayi Liao | Chengpeng Li | Ziyuan Xie | Sihang Li | Xiang Wang
Findings of the Association for Computational Linguistics: ACL 2026
Kai Zhang | Jiayi Liao | Chengpeng Li | Ziyuan Xie | Sihang Li | Xiang Wang
Findings of the Association for Computational Linguistics: ACL 2026
Test-Time Scaling (TTS) has emerged as an effective paradigm for improving the reasoning performance of large language models (LLMs). However, existing methods — most notably majority voting and heuristic token-level scoring — treat reasoning traces or tokens equally, thereby being susceptible to substantial variations in trajectory quality and localized logical failures. In this work, we introduce **Chronos**, a lightweight and plug-and-play chronological reasoning scorer that models each trajectory as a time series. Specifically, Chronos learns to capture trajectory features of token probabilities, assigns quality scores accordingly, and employs a weighted voting mechanism. Extensive evaluations on both in-domain and out-of-domain benchmarks demonstrate that Chronos consistently delivers substantial gains across a variety of models, with negligible computational overhead. Notably, Chronos@128 achieves relative improvements of 34.21% over Pass@1 and 22.70% over Maj@128 on HMMT25 using Qwen3-4B-Thinking-2507, highlighting its effectiveness.
2025
FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models
Hengxing Cai | Jinhan Dong | Jingjun Tan | Jingcheng Deng | Sihang Li | Zhifeng Gao | Haidong Wang | Zicheng Su | Agachai Sumalee | Renxin Zhong
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Hengxing Cai | Jinhan Dong | Jingjun Tan | Jingcheng Deng | Sihang Li | Zhifeng Gao | Haidong Wang | Zicheng Su | Agachai Sumalee | Renxin Zhong
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Unmanned Aerial Vehicle (UAV) Vision-and-Language Navigation (VLN) is vital for applications such as disaster response, logistics delivery, and urban inspection. However, existing methods often struggle with insufficient multimodal fusion, weak generalization, and poor interpretability. To address these challenges, we propose FlightGPT, a novel UAV VLN framework built upon Vision-Language Models (VLMs) with powerful multimodal perception capabilities. We design a two-stage training pipeline: first, Supervised Fine-Tuning (SFT) using high-quality demonstrations to improve initialization and structured reasoning; then, Group Relative Policy Optimization (GRPO) algorithm, guided by a composite reward that considers goal accuracy, reasoning quality, and format compliance, to enhance generalization and adaptability. Furthermore, FlightGPT introduces a Chain-of-Thought (CoT)-based reasoning mechanism to improve decision interpretability. Extensive experiments on the city-scale dataset CityNav demonstrate that FlightGPT achieves state-of-the-art performance across all scenarios, with a 9.22% higher success rate than the strongest baseline in unseen environments. Our implementation is publicly available.
SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis
Hengxing Cai | Xiaochen Cai | Junhan Chang | Sihang Li | Lin Yao | Wang Changxin | Zhifeng Gao | Hongshuai Wang | Li Yongge | Mujie Lin | Shuwen Yang | Jiankun Wang | Mingjun Xu | Jin Huang | Xi Fang | Jiaxi Zhuang | Yuqi Yin | Yaqi Li | Changhong Chen | Zheng Cheng | Zifeng Zhao | Linfeng Zhang | Guolin Ke
Findings of the Association for Computational Linguistics: NAACL 2025
Hengxing Cai | Xiaochen Cai | Junhan Chang | Sihang Li | Lin Yao | Wang Changxin | Zhifeng Gao | Hongshuai Wang | Li Yongge | Mujie Lin | Shuwen Yang | Jiankun Wang | Mingjun Xu | Jin Huang | Xi Fang | Jiaxi Zhuang | Yuqi Yin | Yaqi Li | Changhong Chen | Zheng Cheng | Zifeng Zhao | Linfeng Zhang | Guolin Ke
Findings of the Association for Computational Linguistics: NAACL 2025
Recent breakthroughs in Large Language Models (LLMs) have revolutionized scientific literature analysis. However, existing benchmarks fail to adequately evaluate the proficiency of LLMs in this domain, particularly in scenarios requiring higher-level abilities beyond mere memorization and the handling of multimodal data.In response to this gap, we introduce SciAssess, a benchmark specifically designed for the comprehensive evaluation of LLMs in scientific literature analysis. It aims to thoroughly assess the efficacy of LLMs by evaluating their capabilities in Memorization (L1), Comprehension (L2), and Analysis & Reasoning (L3). It encompasses a variety of tasks drawn from diverse scientific fields, including biology, chemistry, material, and medicine.To ensure the reliability of SciAssess, rigorous quality control measures have been implemented, ensuring accuracy, anonymization, and compliance with copyright standards. SciAssess evaluates 11 LLMs, highlighting their strengths and areas for improvement. We hope this evaluation supports the ongoing development of LLM applications in scientific literature analysis.SciAssess and its resources are available at https://github.com/sci-assess/SciAssess.
Route Sparse Autoencoder to Interpret Large Language Models
Wei Shi | Sihang Li | Tao Liang | Mingyang Wan | Guojun Ma | Xiang Wang | Xiangnan He
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Wei Shi | Sihang Li | Tao Liang | Mingyang Wan | Guojun Ma | Xiang Wang | Xiangnan He
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Mechanistic interpretability of large language models (LLMs) aims to uncover the internal processes of information propagation and reasoning. Sparse autoencoders (SAEs) have demonstrated promise in this domain by extracting interpretable and monosemantic features. However, prior works primarily focus on feature extraction from a single layer, failing to effectively capture activations that span multiple layers. In this paper, we introduce Route Sparse Autoencoder (RouteSAE), a new framework that integrates a routing mechanism with a shared SAE to efficiently extract features from multiple layers. It dynamically assigns weights to activations from different layers, incurring minimal parameter overhead while achieving high interpretability and flexibility for targeted feature manipulation. We evaluate RouteSAE through extensive experiments on Llama-3.2-1B-Instruct. Specifically, under the same sparsity constraint of 64, RouteSAE extracts 22.5% more features than baseline SAEs while achieving a 22.3% higher interpretability score. These results underscore the potential of RouteSAE as a scalable and effective method for LLM interpretability, with applications in feature discovery and model intervention. Our codes are available at https://github.com/swei2001/RouteSAEs.
2024
ReactXT: Understanding Molecular “Reaction-ship” via Reaction-Contextualized Molecule-Text Pretraining
Zhiyuan Liu | Yaorui Shi | An Zhang | Sihang Li | Enzhi Zhang | Xiang Wang | Kenji Kawaguchi | Tat-Seng Chua
Findings of the Association for Computational Linguistics: ACL 2024
Zhiyuan Liu | Yaorui Shi | An Zhang | Sihang Li | Enzhi Zhang | Xiang Wang | Kenji Kawaguchi | Tat-Seng Chua
Findings of the Association for Computational Linguistics: ACL 2024
Molecule-text modeling, which aims to facilitate molecule-relevant tasks with a textual interface and textual knowledge, is an emerging research direction. Beyond single molecules, studying reaction-text modeling holds promise for helping the synthesis of new materials and drugs. However, previous works mostly neglect reaction-text modeling: they primarily focus on modeling individual molecule-text pairs or learning chemical reactions without texts in context. Additionally, one key task of reaction-text modeling – experimental procedure prediction – is less explored due to the absence of an open-source dataset. The task is to predict step-by-step actions of conducting chemical experiments and is crucial to automating chemical synthesis. To resolve the challenges above, we propose a new pretraining method, ReactXT, for reaction-text modeling, and a new dataset, OpenExp, for experimental procedure prediction. Specifically, ReactXT features three types of input contexts to incrementally pretrain LMs. Each of the three input contexts corresponds to a pretraining task to improve the text-based understanding of either reactions or single molecules. ReactXT demonstrates consistent improvements in experimental procedure prediction and molecule captioning and offers competitive results in retrosynthesis. Our code is available at https://github.com/syr-cn/ReactXT.
MolTC: Towards Molecular Relational Modeling In Language Models
Junfeng Fang | Shuai Zhang | Chang Wu | Zhengyi Yang | Zhiyuan Liu | Sihang Li | Kun Wang | Wenjie Du | Xiang Wang
Findings of the Association for Computational Linguistics: ACL 2024
Junfeng Fang | Shuai Zhang | Chang Wu | Zhengyi Yang | Zhiyuan Liu | Sihang Li | Kun Wang | Wenjie Du | Xiang Wang
Findings of the Association for Computational Linguistics: ACL 2024
Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods predominantly rely on textual data, thus not fully harnessing the wealth of structural information inherent in molecular graphs. Moreover, the absence of a unified framework exacerbates the issue of insufficient data exploitation, as it hinders the sharing of interaction mechanism learned across various datasets. To address these challenges, this work proposes a novel LLM-based multi-modal framework for molecular interaction modeling following Chain-of-Thought (CoT) theory, termed MolTC, which effectively integrate graphical information of two molecules in pair. To train this integrated framework efficiently, we introduce a *multi-hierarchical CoT theory* to refine its training paradigm, and conduct a comprehensive *Molecular Interactive Instructions* dataset for the development of biochemical LLMs involving MRL.Our experiments,conducted across various datasets involving over 4,000,000 molecular pairs, exhibit the superiority of our method over current GNN and LLM-based baselines. Code is available at https://github.com/MangoKiller/MolTC.
2023
MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter
Zhiyuan Liu | Sihang Li | Yanchen Luo | Hao Fei | Yixin Cao | Kenji Kawaguchi | Xiang Wang | Tat-Seng Chua
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Zhiyuan Liu | Sihang Li | Yanchen Luo | Hao Fei | Yixin Cao | Kenji Kawaguchi | Xiang Wang | Tat-Seng Chua
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Language Models (LMs) have demonstrated impressive molecule understanding ability on various 1D text-related tasks. However, they inherently lack 2D graph perception — a critical ability of human professionals in comprehending molecules’ topological structures. To bridge this gap, we propose MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter. MolCA enables an LM (i.e., Galactica) to understand both text- and graph-based molecular contents via the cross-modal projector. Specifically, the cross-modal projector is implemented as a Q-Former to connect a graph encoder’s representation space and an LM’s text space. Further, MolCA employs a uni-modal adapter (i.e., LoRA) for the LM’s efficient adaptation to downstream tasks. Unlike previous studies that couple an LM with a graph encoder via cross-modal contrastive learning, MolCA retains the LM’s ability of open-ended text generation and augments it with 2D graph information. To showcase its effectiveness, we extensively benchmark MolCA on tasks of molecule captioning, IUPAC name prediction, and molecule-text retrieval, on which MolCA significantly outperforms the baselines.
Search
Fix author
Co-authors
- Xiang Wang 4
- Zhifeng Gao 3
- Zhiyuan Liu 3
- Hengxing Cai 2
- Zheng Cheng 2
- Tat-Seng Chua 2
- Kenji Kawaguchi 2
- Guolin Ke 2
- Linfeng Zhang 2
- Zifeng Zhao 2
- Xiaochen Cai 1
- Yixin Cao 1
- Junhan Chang 1
- Wang Changxin 1
- Changhong Chen 1
- Jingcheng Deng (邓竞成) 1
- Jinhan Dong 1
- Wenjie Du 1
- Junfeng Fang 1
- Xi Fang 1
- Hao Fei 1
- Yutang Ge 1
- Xiangnan He 1
- Jin Huang 1
- Chengpeng Li 1
- Yaqi Li 1
- Tao Liang 1
- Jiayi Liao 1
- Mujie Lin 1
- Yanchen Luo 1
- Guojun Ma 1
- Yaorui Shi 1
- Wei Shi 1
- Zicheng Su 1
- Agachai Sumalee 1
- Jingjun Tan 1
- Mingyang Wan 1
- Yu Guang Wang 1
- Haidong Wang 1
- Kun Wang 1
- Hongshuai Wang 1
- Jiankun Wang 1
- Xiang Wang 1
- Chang Wu 1
- Hanchen Xia 1
- Ziyuan Xie 1
- Mingjun Xu 1
- Zhengyi Yang 1
- Shuwen Yang 1
- Lin Yao 1
- Yuqi Yin 1
- Li Yongge 1
- Kai Zhang 1
- An Zhang 1
- Enzhi Zhang 1
- Shuai Zhang 1
- Guojiang Zhao 1
- Renxin Zhong 1
- Jiaxi Zhuang 1