SaeRom Kim


2025

pdf bib
TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications
Sunwoo Lee | Daseong Jang | Dhammiko Arya | Gyoung-eun Han | Injee Song | SaeRom Kim | Sangjin Kim | Seojin Lee | Seokyoung Hong | Sereimony Sek | Seung-Mo Cho | Sohee Park | Sungbin Yoon | Wonbeom Jang | Eric Davis
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

As Large Language Models (LLMs) evolve into powerful agentic systems, the telecommunications industry’s expansion into AI services necessitates industry-grounded benchmarks to evaluate their underexplored domain-specific capabilities. To address the gap left by generic benchmarks that fail to assess realistic, non-English performance, we present TelAgentBench, a Korean benchmark for the telecommunications domain evaluating five core agentic capabilities: Reasoning, Planning, Action (tool-use), Retrieval-Augmented Generation, and Instruction Following. Evaluations reveal significant performance disparities between models that employ explicit reasoning and those that do not, providing actionable insights for deploying agentic LLMs in real-world telecommunications tasks.