LLMs as Lab Engineers: A Benchmark for Analytical Method Lifecycle Management

Xiaoyi Chen; Mahsa Monshizadeh; Chaoqi Zhang; Jianjun Lang; Yang Wu; Genevieve Mortensen; Xiaozhong Liu; Haixu Tang

LLMs as Lab Engineers: A Benchmark for Analytical Method Lifecycle Management

Xiaoyi Chen, Mahsa Monshizadeh, Chaoqi Zhang, Jianjun Lang, Yang Wu, Genevieve Mortensen, Xiaozhong Liu, Haixu Tang

Abstract

We introduce ChemBench, a comprehensive benchmark for evaluating LLMs’ capabilities in analytical chemistry scenarios. Unlike existing benchmarks focused on factual knowledge, ChemBench assesses model abilities to provide contextualized, practical guidance for complex analytical chemistry challenges, including instrument readiness checks, system suitability testing, method development, and troubleshooting for both liquid chromatography coupled mass spectrometry (LC-MS) and Gas Chromatography-Mass Spectrometry (GC-MS) platforms. We evaluate three enhancement approaches: chemistry-specialized models, human-guided Chain-of-Thought reasoning, and Retrieval-Augmented Generation (RAG). Our findings reveal that general-purpose commercial models often outperform domain-specialized ones, while RAG and reasoning significantly improve performance. The six-dimension evaluation framework (specificity, correctness, usefulness, feasibility, misinformation risk, and error handling) provides valuable insights into LLMs’ real-world utility for chemistry researchers, establishing a foundation for developing more effective AI assistants for scientific research.

Anthology ID:: 2026.findings-acl.2015
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 40533–40553
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2015/
DOI:
Bibkey:
Cite (ACL):: Xiaoyi Chen, Mahsa Monshizadeh, Chaoqi Zhang, Jianjun Lang, Yang Wu, Genevieve Mortensen, Xiaozhong Liu, and Haixu Tang. 2026. LLMs as Lab Engineers: A Benchmark for Analytical Method Lifecycle Management. In Findings of the Association for Computational Linguistics: ACL 2026, pages 40533–40553, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: LLMs as Lab Engineers: A Benchmark for Analytical Method Lifecycle Management (Chen et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2015.pdf
Checklist:: 2026.findings-acl.2015.checklist.pdf

PDF Cite Search Checklist Fix data