Jason Perera


2025

pdf bib
Virtual CRISPR: Can LLMs Predict CRISPR Screen Results?
Steven Song | Abdalla Abdrabou | Asmita Dabholkar | Kastan Day | Pavan Dharmoju | Jason Perera | Volodymyr Kindratenko | Aly Khan
ACL 2025

CRISPR-Cas systems enable systematic investigation of gene function, but experimental CRISPR screens are resource-intensive. Here, we investigate the potential of Large Language Models (LLMs) to predict the outcomes of CRISPR screens in silico, thereby prioritizing experiments and accelerating biological discovery. We introduce a benchmark dataset derived from BioGRID-ORCS and manually curated sources, and evaluate the performance of several LLMs across various prompting strategies, including chain-of-thought and few-shot learning. Furthermore, we develop a novel, efficient prediction framework using LLM-derived embeddings, achieving significantly improved performance and scalability compared to direct prompting. Our results demonstrate the feasibility of using LLMs to guide CRISPR screen experiments.