Ki Hyun Kim
2025
ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining
Seonwu Kim
|
Yohan Na
|
Kihun Kim
|
Hanhee Cho
|
Geun Lim
|
Mintae Kim
|
Seongik Park
|
Ki Hyun Kim
|
Youngsub Han
|
Byoung-Ki Jeon
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
The emergence of open-source large language models (LLMs) has expanded opportunities for enterprise applications; however, many organizations still lack the infrastructure to deploy and maintain large-scale models. As a result, small LLMs (sLLMs) have become a practical alternative despite inherent performance limitations. While Domain Adaptive Continual Pretraining (DACP) has been explored for domain adaptation, its utility in commercial settings remains under-examined. In this study, we validate the effectiveness of a DACP-based recipe across diverse foundation models and service domains, producing DACP-applied sLLMs (ixi-GEN). Through extensive experiments and real-world evaluations, we demonstrate that ixi-GEN models achieve substantial gains in target-domain performance while preserving general capabilities, offering a cost-efficient and scalable solution for enterprise-level deployment.
2023
What, When, and How to Ground: Designing User Persona-Aware Conversational Agents for Engaging Dialogue
Deuksin Kwon
|
Sunwoo Lee
|
Ki Hyun Kim
|
Seojin Lee
|
Taeyoon Kim
|
Eric Davis
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
This paper presents a method for building a personalized open-domain dialogue system to address the WWH (WHAT, WHEN, and HOW) problem for natural response generation in a commercial setting, where personalized dialogue responses are heavily interleaved with casual response turns. The proposed approach involves weighted dataset blending, negative persona information augmentation methods, and the design of personalized conversation datasets to address the challenges of WWH in personalized, open-domain dialogue systems. Our work effectively balances dialogue fluency and tendency to ground, while also introducing a response-type label to improve the controllability and explainability of the grounded responses. The combination of these methods leads to more fluent conversations, as evidenced by subjective human evaluations as well as objective evaluations.
Search
Fix author
Co-authors
- Hanhee Cho 1
- Eric Davis 1
- Youngsub Han 1
- Byoung-Ki Jeon 1
- Taeyoon Kim 1
- show all...