Yuan Dong


2025

pdf bib
Biology-Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models
Haonan He | Yuchen Ren | Yining Tang | Ziyang Xu | Junxian Li | Minghao Yang | Di Zhang | Yuan Dong | Tao Chen | Shufei Zhang | Yuqiang Li | Nanqing Dong | Wanli Ouyang | Dongzhan Zhou | Peng Ye
Findings of the Association for Computational Linguistics: EMNLP 2025

Large language models (LLMs) have shown remarkable capabilities in general domains, but their application to multi-omics biology remains underexplored. To address this gap, we introduce Biology-Instructions, the first large-scale instruction-tuning dataset for multi-omics biological sequences, including DNA, RNA, proteins, and multi-molecules. This dataset bridges LLMs and complex biological sequence-related tasks, enhancing their versatility and reasoning while maintaining conversational fluency. We also highlight significant limitations of current state-of-the-art LLMs on multi-omics tasks without specialized training. To overcome this, we propose ChatMultiOmics, a strong baseline with a novel three-stage training pipeline, demonstrating superior biological understanding through Biology-Instructions. Both resources are publicly available, paving the way for better integration of LLMs in multi-omics analysis. The Biology-Instructions is publicly available at: https://github.com/hhnqqq/Biology-Instructions.

2009

pdf bib
Normalized Accessor Variety Combined with Conditional Random Fields in Chinese Word Segmentation
Saike He | Taozheng Zhang | Xue Bai | Xiaojie Wang | Yuan Dong
Proceedings of the Student Research Workshop

pdf bib
Multi-Task Learning in Conditional Random Fields for Chunking in Shallow Semantic Parsing
Saike He | Xiaojie Wang | Yuan Dong | Taozheng Zhang | Xue Bai
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

2008

pdf bib
Chinese Word Segmentation and Named Entity Recognition Based on Conditional Random Fields
Xinnian Mao | Yuan Dong | Saike He | Sencheng Bao | Haila Wang
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

2007

pdf bib
Using Non-Local Features to Improve Named Entity Recognition Recall
Xinnian Mao | Wei Xu | Yuan Dong | Saike He | Haila Wang
Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation

2006

pdf bib
France Telecom R&D Beijing Word Segmenter for Sighan Bakeoff 2006
Wu Liu | Heng Li | Yuan Dong | Nan He | Haitao Luo | Haila Wang
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

2005

pdf bib
Chinese Word Segmentation in FTRD Beijing
Heng Li | Yuan Dong | Xinnian Mao | Haila Wang | Wu Liu
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing