Jianjun Lang
2026
LLMs as Lab Engineers: A Benchmark for Analytical Method Lifecycle Management
Xiaoyi Chen | Mahsa Monshizadeh | Chaoqi Zhang | Jianjun Lang | Yang Wu | Genevieve Mortensen | Xiaozhong Liu | Haixu Tang
Findings of the Association for Computational Linguistics: ACL 2026
Xiaoyi Chen | Mahsa Monshizadeh | Chaoqi Zhang | Jianjun Lang | Yang Wu | Genevieve Mortensen | Xiaozhong Liu | Haixu Tang
Findings of the Association for Computational Linguistics: ACL 2026
We introduce ChemBench, a comprehensive benchmark for evaluating LLMs’ capabilities in analytical chemistry scenarios. Unlike existing benchmarks focused on factual knowledge, ChemBench assesses model abilities to provide contextualized, practical guidance for complex analytical chemistry challenges, including instrument readiness checks, system suitability testing, method development, and troubleshooting for both liquid chromatography coupled mass spectrometry (LC-MS) and Gas Chromatography-Mass Spectrometry (GC-MS) platforms. We evaluate three enhancement approaches: chemistry-specialized models, human-guided Chain-of-Thought reasoning, and Retrieval-Augmented Generation (RAG). Our findings reveal that general-purpose commercial models often outperform domain-specialized ones, while RAG and reasoning significantly improve performance. The six-dimension evaluation framework (specificity, correctness, usefulness, feasibility, misinformation risk, and error handling) provides valuable insights into LLMs’ real-world utility for chemistry researchers, establishing a foundation for developing more effective AI assistants for scientific research.