Can Large Language Models Classify and Generate Antimicrobial Resistance Genes?

Hyunwoo Yoo, Haebin Shin, Gail Rosen


Abstract
This study explores the application of generative Large Language Models (LLMs) in DNA sequence analysis, highlighting their advantages over encoder-based models like DNABERT2 and Nucleotide Transformer. While encoder models excel in classification, they struggle to integrate external textual information. In contrast, generative LLMs can incorporate domain knowledge, such as BLASTn annotations, to improve classification accuracy even without fine-tuning. We evaluate this capability on antimicrobial resistance (AMR) gene classification, comparing generative LLMs with encoder-based baselines. Results show that LLMs significantly enhance classification when supplemented with textual information. Additionally, we demonstrate their potential in DNA sequence generation, further expanding their applicability. Our findings suggest that LLMs offer a novel paradigm for integrating biological sequences with external knowledge, bridging gaps in traditional classification methods.
Anthology ID:
2025.bionlp-1.21
Volume:
ACL 2025
Month:
August
Year:
2025
Address:
Viena, Austria
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
240–248
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.21/
DOI:
Bibkey:
Cite (ACL):
Hyunwoo Yoo, Haebin Shin, and Gail Rosen. 2025. Can Large Language Models Classify and Generate Antimicrobial Resistance Genes?. In ACL 2025, pages 240–248, Viena, Austria. Association for Computational Linguistics.
Cite (Informal):
Can Large Language Models Classify and Generate Antimicrobial Resistance Genes? (Yoo et al., BioNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.21.pdf
Supplementarymaterial:
 2025.bionlp-1.21.SupplementaryMaterial.txt