TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications

Sunwoo Lee; Daseong Jang; Dhammiko Arya; Gyoung-eun Han; Injee Song; SaeRom Kim; Sangjin Kim; Seojin Lee; Seokyoung Hong; Sereimony Sek; Seung-Mo Cho; Sohee Park; Sungbin Yoon; Wonbeom Jang; Eric Davis

TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications

Sunwoo Lee, Daseong Jang, Dhammiko Arya, Gyoung-eun Han, Injee Song, SaeRom Kim, Sangjin Kim, Seojin Lee, Seokyoung Hong, Sereimony Sek, Seung-Mo Cho, Sohee Park, Sungbin Yoon, Wonbeom Jang, Eric Davis

Abstract

As Large Language Models (LLMs) evolve into powerful agentic systems, the telecommunications industry’s expansion into AI services necessitates industry-grounded benchmarks to evaluate their underexplored domain-specific capabilities. To address the gap left by generic benchmarks that fail to assess realistic, non-English performance, we present TelAgentBench, a Korean benchmark for the telecommunications domain evaluating five core agentic capabilities: Reasoning, Planning, Action (tool-use), Retrieval-Augmented Generation, and Instruction Following. Evaluations reveal significant performance disparities between models that employ explicit reasoning and those that do not, providing actionable insights for deploying agentic LLMs in real-world telecommunications tasks.

Anthology ID:: 2025.emnlp-industry.83
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1173–1211
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.83/
DOI:
Bibkey:
Cite (ACL):: Sunwoo Lee, Daseong Jang, Dhammiko Arya, Gyoung-eun Han, Injee Song, SaeRom Kim, Sangjin Kim, Seojin Lee, Seokyoung Hong, Sereimony Sek, Seung-Mo Cho, Sohee Park, Sungbin Yoon, Wonbeom Jang, and Eric Davis. 2025. TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1173–1211, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications (Lee et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.83.pdf

PDF Cite Search Fix data