TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications

Sunwoo Lee, Daseong Jang, Dhammiko Arya, Gyoung-eun Han, Injee Song, SaeRom Kim, Sangjin Kim, Seojin Lee, Seokyoung Hong, Sereimony Sek, Seung-Mo Cho, Sohee Park, Sungbin Yoon, Wonbeom Jang, Eric Davis


Abstract
As Large Language Models (LLMs) evolve into powerful agentic systems, the telecommunications industry’s expansion into AI services necessitates industry-grounded benchmarks to evaluate their underexplored domain-specific capabilities. To address the gap left by generic benchmarks that fail to assess realistic, non-English performance, we present TelAgentBench, a Korean benchmark for the telecommunications domain evaluating five core agentic capabilities: Reasoning, Planning, Action (tool-use), Retrieval-Augmented Generation, and Instruction Following. Evaluations reveal significant performance disparities between models that employ explicit reasoning and those that do not, providing actionable insights for deploying agentic LLMs in real-world telecommunications tasks.
Anthology ID:
2025.emnlp-industry.83
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2025
Address:
Suzhou (China)
Editors:
Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1173–1211
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.83/
DOI:
Bibkey:
Cite (ACL):
Sunwoo Lee, Daseong Jang, Dhammiko Arya, Gyoung-eun Han, Injee Song, SaeRom Kim, Sangjin Kim, Seojin Lee, Seokyoung Hong, Sereimony Sek, Seung-Mo Cho, Sohee Park, Sungbin Yoon, Wonbeom Jang, and Eric Davis. 2025. TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1173–1211, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):
TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications (Lee et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.83.pdf