This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
SunwooLee
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
As Large Language Models (LLMs) evolve into powerful agentic systems, the telecommunications industry’s expansion into AI services necessitates industry-grounded benchmarks to evaluate their underexplored domain-specific capabilities. To address the gap left by generic benchmarks that fail to assess realistic, non-English performance, we present TelAgentBench, a Korean benchmark for the telecommunications domain evaluating five core agentic capabilities: Reasoning, Planning, Action (tool-use), Retrieval-Augmented Generation, and Instruction Following. Evaluations reveal significant performance disparities between models that employ explicit reasoning and those that do not, providing actionable insights for deploying agentic LLMs in real-world telecommunications tasks.
The telecommunications industry, characterized by its vast customer base and complex service offerings, necessitates a high level of domain expertise and proficiency in customer service center operations. Consequently, there is a growing demand for Large Language Models (LLMs) to augment the capabilities of customer service representatives. This paper introduces a methodology for developing a specialized Telecommunications LLM (Telco LLM) designed to enhance the efficiency of customer service agents and promote consistency in service quality across representatives. We present the construction process of TelBench, a novel dataset created for performance evaluation of customer service expertise in the telecommunications domain. We also evaluate various LLMs and demonstrate the ability to benchmark both proprietary and open-source LLMs on predefined telecommunications-related tasks, thereby establishing metrics that define telcommunications performance.
This paper presents a method for building a personalized open-domain dialogue system to address the WWH (WHAT, WHEN, and HOW) problem for natural response generation in a commercial setting, where personalized dialogue responses are heavily interleaved with casual response turns. The proposed approach involves weighted dataset blending, negative persona information augmentation methods, and the design of personalized conversation datasets to address the challenges of WWH in personalized, open-domain dialogue systems. Our work effectively balances dialogue fluency and tendency to ground, while also introducing a response-type label to improve the controllability and explainability of the grounded responses. The combination of these methods leads to more fluent conversations, as evidenced by subjective human evaluations as well as objective evaluations.