SDBench: A Survey-based Domain-specific LLM Benchmarking and Optimization Framework

Cheng Guo, Hu Kai, Shuxian Liang, Yiyang Jiang, Yi Gao, Xian-Sheng Hua, Wei Dong


Abstract
The rapid advancement of large language models (LLMs) in recent years has made it feasible to establish domain-specific LLMs for specialized fields. However, in practical development, acquiring domain-specific knowledge often requires a significant amount of professional expert manpower. Moreover, even when domain-specific data is available, the lack of a unified methodology for benchmark dataset establishment often results in uneven data distribution. This imbalance can lead to an inaccurate assessment of the true model capabilities during the evaluation of domain-specific LLMs. To address these challenges, we introduce **SDBench**, a generic framework for generating evaluation datasets for domain-specific LLMs. This method is also applicable for establishing the LLM instruction datasets. It significantly reduces the reliance on expert manpower while ensuring that the collected data is uniformly distributed. To validate the effectiveness of this framework, we also present the **BridgeBench**, a novel benchmark for bridge engineering knowledge, and the **BridgeGPT**, the first LLM specialized in bridge engineering, which can solve bridge engineering tasks.
Anthology ID:
2025.acl-long.662
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13492–13506
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.662/
DOI:
Bibkey:
Cite (ACL):
Cheng Guo, Hu Kai, Shuxian Liang, Yiyang Jiang, Yi Gao, Xian-Sheng Hua, and Wei Dong. 2025. SDBench: A Survey-based Domain-specific LLM Benchmarking and Optimization Framework. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13492–13506, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
SDBench: A Survey-based Domain-specific LLM Benchmarking and Optimization Framework (Guo et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.662.pdf