EmCellLLM: Human Peri-Implantation Embryonic Cell Annotation Based on Large Language Models

Xiaorui Guo; Zhiwei Liu; Qianqian Xie; Sophia Ananiadou

EmCellLLM: Human Peri-Implantation Embryonic Cell Annotation Based on Large Language Models

Xiaorui Guo, Zhiwei Liu, Qianqian Xie, Sophia Ananiadou

Abstract

The advent of single-cell RNA sequencing has enabled unprecedented resolution of cell fate decisions and regulatory mechanisms during peri-implantation human embryogenesis, in which accurate cell type annotation is a fundamental prerequisite and the first step for subsequent fate and mechanism inference. Large language models (LLMs) have demonstrated outstanding performance in various fields. However, current studies mostly rely on traditional methods and have not explored the application of LLMs in the field of human embryonic cell annotation. The main reason is the lack of instruction tuning datasets and evaluation benchmarks. In this paper, we proposed EmCellLLM, the first open sourced LLMs that are specialized for human embryonic cell type prediction task based on fine-tuning Qwen3-8B with EmCell4Instruction, the first embryonic cell type prediction instruction dataset. To support LLM instruction tuning, we also build EmCellBench, the first benchmark for evaluating human embryonic cell type prediction ability of LLMs. We compare our models with a variety of LLMs on EmCellBench, where our model outperforms all other open-sourced LLMs as well as DeepSeek.

Anthology ID:: 2026.bionlp-1.30
Volume:: BioNLP 2026
Month:: July
Year:: 2026
Address:: San Diego, California
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:: BioNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 382–391
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.30/
DOI:
Bibkey:
Cite (ACL):: Xiaorui Guo, Zhiwei Liu, Qianqian Xie, and Sophia Ananiadou. 2026. EmCellLLM: Human Peri-Implantation Embryonic Cell Annotation Based on Large Language Models. In BioNLP 2026, pages 382–391, San Diego, California. Association for Computational Linguistics.
Cite (Informal):: EmCellLLM: Human Peri-Implantation Embryonic Cell Annotation Based on Large Language Models (Guo et al., BioNLP 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.30.pdf

PDF Cite Search Fix data