Seung-Mo Cho


2025

pdf bib
TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications
Sunwoo Lee | Daseong Jang | Dhammiko Arya | Gyoung-eun Han | Injee Song | SaeRom Kim | Sangjin Kim | Seojin Lee | Seokyoung Hong | Sereimony Sek | Seung-Mo Cho | Sohee Park | Sungbin Yoon | Wonbeom Jang | Eric Davis
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

As Large Language Models (LLMs) evolve into powerful agentic systems, the telecommunications industry’s expansion into AI services necessitates industry-grounded benchmarks to evaluate their underexplored domain-specific capabilities. To address the gap left by generic benchmarks that fail to assess realistic, non-English performance, we present TelAgentBench, a Korean benchmark for the telecommunications domain evaluating five core agentic capabilities: Reasoning, Planning, Action (tool-use), Retrieval-Augmented Generation, and Instruction Following. Evaluations reveal significant performance disparities between models that employ explicit reasoning and those that do not, providing actionable insights for deploying agentic LLMs in real-world telecommunications tasks.

2024

pdf bib
TelBench: A Benchmark for Evaluating Telco-Specific Large Language Models
Sunwoo Lee | Dhammiko Arya | Seung-Mo Cho | Gyoung-eun Han | Seokyoung Hong | Wonbeom Jang | Seojin Lee | Sohee Park | Sereimony Sek | Injee Song | Sungbin Yoon | Eric Davis
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

The telecommunications industry, characterized by its vast customer base and complex service offerings, necessitates a high level of domain expertise and proficiency in customer service center operations. Consequently, there is a growing demand for Large Language Models (LLMs) to augment the capabilities of customer service representatives. This paper introduces a methodology for developing a specialized Telecommunications LLM (Telco LLM) designed to enhance the efficiency of customer service agents and promote consistency in service quality across representatives. We present the construction process of TelBench, a novel dataset created for performance evaluation of customer service expertise in the telecommunications domain. We also evaluate various LLMs and demonstrate the ability to benchmark both proprietary and open-source LLMs on predefined telecommunications-related tasks, thereby establishing metrics that define telcommunications performance.