Zikang Wang


2025

pdf bib
RBPtool: A Deep Language Model Framework for Multi-Resolution RBP-RNA Binding Prediction and RNA Molecule Design
Jiyue Jiang | Yitao Xu | Zikang Wang | Yihan Ye | Yanruisheng Shao | Yuheng Shan | Jiuming Wang | Xiaodan Fan | Jiao Yuan | Yu Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

RNA-binding proteins (RBPs) play essential roles in post-transcriptional gene regulation via recognizing specific RNA molecules as well as modulating several key physiological processes in cellulo, represented by alternative splicing and RNA degradation. Despite extensive research, most existing approaches still rely on superficial sequence features or coarse structural representations, limiting their ability to capture the intricate nature of RBP-RNA interactions. The recent surge in large language models (LLMs), combined with advances in geometric deep learning for extracting three-dimensional representations, enables the integration of multi-modal, multi-scale biological data for precise modeling and biologically informed de novo RNA design. In this work, we curate and extend RPI15223 into a multi-resolution, structure-level RBP-RNA dataset, and introduce RBPtool, a multi-task, multi-resolution framework that combines a geometric vector perception (GVP) module together with a deep language model encoder to fuse sequence and structural information. Our tool achieves state-of-the-art performance on public benchmarks and the RPI15223 dataset, while also supporting fine-grained level predictions and enabling de novo RNA design through a generative module conditioned on protein, cell-type, and specified species. RBPtool provides a fast and versatile platform for both fundamental RBP-RNA research and practical RNA drug design, delivering enhanced predictive accuracy and fine-grained structural insights.

pdf bib
Large Language Models in Bioinformatics: A Survey
Zhenyu Wang | Zikang Wang | Jiyue Jiang | Pengan Chen | Xiangyu Shi | Yu Li
Findings of the Association for Computational Linguistics: ACL 2025

Large Language Models (LLMs) are revolutionizing bioinformatics, enabling advanced analysis of DNA, RNA, proteins, and single-cell data. This survey provides a systematic review of recent advancements, focusing on genomic sequence modeling, RNA structure prediction, protein function inference, and single-cell transcriptomics. Meanwhile, we also discuss several key challenges, including data scarcity, computational complexity, and cross-omics integration, and explore future directions such as multimodal learning, hybrid AI models, and clinical applications. By offering a comprehensive perspective, this paper underscores the transformative potential of LLMs in driving innovations in bioinformatics and precision medicine.

2020

pdf bib
Knowledge-Enhanced Natural Language Inference Based on Knowledge Graphs
Zikang Wang | Linjing Li | Daniel Zeng
Proceedings of the 28th International Conference on Computational Linguistics

Natural Language Inference (NLI) is a vital task in natural language processing. It aims to identify the logical relationship between two sentences. Most of the existing approaches make such inference based on semantic knowledge obtained through training corpus. The adoption of background knowledge is rarely seen or limited to a few specific types. In this paper, we propose a novel Knowledge Graph-enhanced NLI (KGNLI) model to leverage the usage of background knowledge stored in knowledge graphs in the field of NLI. KGNLI model consists of three components: a semantic-relation representation module, a knowledge-relation representation module, and a label prediction module. Different from previous methods, various kinds of background knowledge can be flexibly combined in the proposed KGNLI model. Experiments on four benchmarks, SNLI, MultiNLI, SciTail, and BNLI, validate the effectiveness of our model.