Yuan Dong


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Biology-Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models
Haonan He | Yuchen Ren | Yining Tang | Ziyang Xu | Junxian Li | Minghao Yang | Di Zhang | Yuan Dong | Tao Chen | Shufei Zhang | Yuqiang Li | Nanqing Dong | Wanli Ouyang | Dongzhan Zhou | Peng Ye
Findings of the Association for Computational Linguistics: EMNLP 2025

Large language models (LLMs) have shown remarkable capabilities in general domains, but their application to multi-omics biology remains underexplored. To address this gap, we introduce Biology-Instructions, the first large-scale instruction-tuning dataset for multi-omics biological sequences, including DNA, RNA, proteins, and multi-molecules. This dataset bridges LLMs and complex biological sequence-related tasks, enhancing their versatility and reasoning while maintaining conversational fluency. We also highlight significant limitations of current state-of-the-art LLMs on multi-omics tasks without specialized training. To overcome this, we propose ChatMultiOmics, a strong baseline with a novel three-stage training pipeline, demonstrating superior biological understanding through Biology-Instructions. Both resources are publicly available, paving the way for better integration of LLMs in multi-omics analysis. The Biology-Instructions is publicly available at: https://github.com/hhnqqq/Biology-Instructions.

2009

pdf bib
Normalized Accessor Variety Combined with Conditional Random Fields in Chinese Word Segmentation
Saike He | Taozheng Zhang | Xue Bai | Xiaojie Wang | Yuan Dong
Proceedings of the Student Research Workshop

pdf bib
Multi-Task Learning in Conditional Random Fields for Chunking in Shallow Semantic Parsing
Saike He | Xiaojie Wang | Yuan Dong | Taozheng Zhang | Xue Bai
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

2008

pdf bib
Chinese Word Segmentation and Named Entity Recognition Based on Conditional Random Fields
Xinnian Mao | Yuan Dong | Saike He | Sencheng Bao | Haila Wang
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

2007

pdf bib
Using Non-Local Features to Improve Named Entity Recognition Recall
Xinnian Mao | Wei Xu | Yuan Dong | Saike He | Haila Wang
Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation

2006

pdf bib
France Telecom R&D Beijing Word Segmenter for Sighan Bakeoff 2006
Wu Liu | Heng Li | Yuan Dong | Nan He | Haitao Luo | Haila Wang
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

2005

pdf bib
Chinese Word Segmentation in FTRD Beijing
Heng Li | Yuan Dong | Xinnian Mao | Haila Wang | Wu Liu
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing