Zikang Wang

2025

RNA-binding proteins (RBPs) play essential roles in post-transcriptional gene regulation via recognizing specific RNA molecules as well as modulating several key physiological processes in cellulo, represented by alternative splicing and RNA degradation. Despite extensive research, most existing approaches still rely on superficial sequence features or coarse structural representations, limiting their ability to capture the intricate nature of RBP-RNA interactions. The recent surge in large language models (LLMs), combined with advances in geometric deep learning for extracting three-dimensional representations, enables the integration of multi-modal, multi-scale biological data for precise modeling and biologically informed de novo RNA design. In this work, we curate and extend RPI15223 into a multi-resolution, structure-level RBP-RNA dataset, and introduce RBPtool, a multi-task, multi-resolution framework that combines a geometric vector perception (GVP) module together with a deep language model encoder to fuse sequence and structural information. Our tool achieves state-of-the-art performance on public benchmarks and the RPI15223 dataset, while also supporting fine-grained level predictions and enabling de novo RNA design through a generative module conditioned on protein, cell-type, and specified species. RBPtool provides a fast and versatile platform for both fundamental RBP-RNA research and practical RNA drug design, delivering enhanced predictive accuracy and fine-grained structural insights.

Large Language Models (LLMs) are revolutionizing bioinformatics, enabling advanced analysis of DNA, RNA, proteins, and single-cell data. This survey provides a systematic review of recent advancements, focusing on genomic sequence modeling, RNA structure prediction, protein function inference, and single-cell transcriptomics. Meanwhile, we also discuss several key challenges, including data scarcity, computational complexity, and cross-omics integration, and explore future directions such as multimodal learning, hybrid AI models, and clinical applications. By offering a comprehensive perspective, this paper underscores the transformative potential of LLMs in driving innovations in bioinformatics and precision medicine.

2020

pdf bib abs
Knowledge-Enhanced Natural Language Inference Based on Knowledge Graphs
Zikang Wang | Linjing Li | Daniel Zeng
Proceedings of the 28th International Conference on Computational Linguistics

Natural Language Inference (NLI) is a vital task in natural language processing. It aims to identify the logical relationship between two sentences. Most of the existing approaches make such inference based on semantic knowledge obtained through training corpus. The adoption of background knowledge is rarely seen or limited to a few specific types. In this paper, we propose a novel Knowledge Graph-enhanced NLI (KGNLI) model to leverage the usage of background knowledge stored in knowledge graphs in the field of NLI. KGNLI model consists of three components: a semantic-relation representation module, a knowledge-relation representation module, and a label prediction module. Different from previous methods, various kinds of background knowledge can be flexibly combined in the proposed KGNLI model. Experiments on four benchmarks, SNLI, MultiNLI, SciTail, and BNLI, validate the effectiveness of our model.

Co-authors

Venues

Fix author