Sangjin Kim
2025
TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications
Sunwoo Lee
|
Daseong Jang
|
Dhammiko Arya
|
Gyoung-eun Han
|
Injee Song
|
SaeRom Kim
|
Sangjin Kim
|
Seojin Lee
|
Seokyoung Hong
|
Sereimony Sek
|
Seung-Mo Cho
|
Sohee Park
|
Sungbin Yoon
|
Wonbeom Jang
|
Eric Davis
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
As Large Language Models (LLMs) evolve into powerful agentic systems, the telecommunications industry’s expansion into AI services necessitates industry-grounded benchmarks to evaluate their underexplored domain-specific capabilities. To address the gap left by generic benchmarks that fail to assess realistic, non-English performance, we present TelAgentBench, a Korean benchmark for the telecommunications domain evaluating five core agentic capabilities: Reasoning, Planning, Action (tool-use), Retrieval-Augmented Generation, and Instruction Following. Evaluations reveal significant performance disparities between models that employ explicit reasoning and those that do not, providing actionable insights for deploying agentic LLMs in real-world telecommunications tasks.
Search
Fix author
Co-authors
- Dhammiko Arya 1
- Seung-Mo Cho 1
- Eric Davis 1
- Gyoung-eun Han 1
- Seokyoung Hong 1
- show all...