TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications
Sunwoo Lee, Daseong Jang, Dhammiko Arya, Gyoung-eun Han, Injee Song, SaeRom Kim, Sangjin Kim, Seojin Lee, Seokyoung Hong, Sereimony Sek, Seung-Mo Cho, Sohee Park, Sungbin Yoon, Wonbeom Jang, Eric Davis
Abstract
As Large Language Models (LLMs) evolve into powerful agentic systems, the telecommunications industry’s expansion into AI services necessitates industry-grounded benchmarks to evaluate their underexplored domain-specific capabilities. To address the gap left by generic benchmarks that fail to assess realistic, non-English performance, we present TelAgentBench, a Korean benchmark for the telecommunications domain evaluating five core agentic capabilities: Reasoning, Planning, Action (tool-use), Retrieval-Augmented Generation, and Instruction Following. Evaluations reveal significant performance disparities between models that employ explicit reasoning and those that do not, providing actionable insights for deploying agentic LLMs in real-world telecommunications tasks.- Anthology ID:
- 2025.emnlp-industry.83
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou (China)
- Editors:
- Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1173–1211
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.83/
- DOI:
- Cite (ACL):
- Sunwoo Lee, Daseong Jang, Dhammiko Arya, Gyoung-eun Han, Injee Song, SaeRom Kim, Sangjin Kim, Seojin Lee, Seokyoung Hong, Sereimony Sek, Seung-Mo Cho, Sohee Park, Sungbin Yoon, Wonbeom Jang, and Eric Davis. 2025. TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1173–1211, Suzhou (China). Association for Computational Linguistics.
- Cite (Informal):
- TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications (Lee et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.83.pdf