Zhiwei Zhang


2025

pdf bib
GAMIC: Graph-Aligned Molecular In-context Learning for Molecule Analysis via LLMs
Ali Al Lawati | Jason S Lucas | Zhiwei Zhang | Prasenjit Mitra | Suhang Wang
Findings of the Association for Computational Linguistics: EMNLP 2025

In-context learning (ICL) effectively conditions large language models (LLMs) for molecular tasks, such as property prediction and molecule captioning, by embedding carefully selected demonstration examples into the input prompt. This approach eliminates the computational overhead of extensive pre-training and fine-tuning. However, current prompt retrieval methods for molecular tasks rely on molecule feature similarity, such as Morgan fingerprints, which do not adequately capture the global molecular and atom-binding relationships. As a result, these methods fail to represent the full complexity of molecular structures during inference. Moreover, medium-sized LLMs, which offer simpler deployment requirements in specialized systems, have remained largely unexplored in the molecular ICL literature. To address these gaps, we propose a self-supervised learning technique, GAMIC (Graph-Aligned Molecular In-Context learning), which aligns global molecular structures, represented by graph neural networks (GNNs), with textual captions (descriptions) while leveraging local feature similarity through Morgan fingerprints. In addition, we introduce a Maximum Marginal Relevance (MMR) based diversity heuristic during retrieval to optimize input prompt demonstration samples. Our experimental findings using diverse benchmark datasets show GAMIC outperforms simple Morgan-based ICL retrieval methods across all tasks by up to 45%. Our code is available at: https://github.com/aliwister/mol-icl.

2021

pdf bib
Abstract, Rationale, Stance: A Joint Model for Scientific Claim Verification
Zhiwei Zhang | Jiyi Li | Fumiyo Fukumoto | Yanming Ye
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Scientific claim verification can help the researchers to easily find the target scientific papers with the sentence evidence from a large corpus for the given claim. Some existing works propose pipeline models on the three tasks of abstract retrieval, rationale selection and stance prediction. Such works have the problems of error propagation among the modules in the pipeline and lack of sharing valuable information among modules. We thus propose an approach, named as ARSJoint, that jointly learns the modules for the three tasks with a machine reading comprehension framework by including claim information. In addition, we enhance the information exchanges and constraints among tasks by proposing a regularization term between the sentence attention scores of abstract retrieval and the estimated outputs of rational selection. The experimental results on the benchmark dataset SciFact show that our approach outperforms the existing works.

2015

pdf bib
LDTM: A Latent Document Type Model for Cumulative Citation Recommendation
Jingang Wang | Dandan Song | Zhiwei Zhang | Lejian Liao | Luo Si | Chin-Yew Lin
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing