LLMs as Lab Engineers: A Benchmark for Analytical Method Lifecycle Management
Xiaoyi Chen, Mahsa Monshizadeh, Chaoqi Zhang, Jianjun Lang, Yang Wu, Genevieve Mortensen, Xiaozhong Liu, Haixu Tang
Abstract
We introduce ChemBench, a comprehensive benchmark for evaluating LLMs’ capabilities in analytical chemistry scenarios. Unlike existing benchmarks focused on factual knowledge, ChemBench assesses model abilities to provide contextualized, practical guidance for complex analytical chemistry challenges, including instrument readiness checks, system suitability testing, method development, and troubleshooting for both liquid chromatography coupled mass spectrometry (LC-MS) and Gas Chromatography-Mass Spectrometry (GC-MS) platforms. We evaluate three enhancement approaches: chemistry-specialized models, human-guided Chain-of-Thought reasoning, and Retrieval-Augmented Generation (RAG). Our findings reveal that general-purpose commercial models often outperform domain-specialized ones, while RAG and reasoning significantly improve performance. The six-dimension evaluation framework (specificity, correctness, usefulness, feasibility, misinformation risk, and error handling) provides valuable insights into LLMs’ real-world utility for chemistry researchers, establishing a foundation for developing more effective AI assistants for scientific research.- Anthology ID:
- 2026.findings-acl.2015
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 40533–40553
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2015/
- DOI:
- Cite (ACL):
- Xiaoyi Chen, Mahsa Monshizadeh, Chaoqi Zhang, Jianjun Lang, Yang Wu, Genevieve Mortensen, Xiaozhong Liu, and Haixu Tang. 2026. LLMs as Lab Engineers: A Benchmark for Analytical Method Lifecycle Management. In Findings of the Association for Computational Linguistics: ACL 2026, pages 40533–40553, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- LLMs as Lab Engineers: A Benchmark for Analytical Method Lifecycle Management (Chen et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2015.pdf