Seongik Park
2025
ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining
Seonwu Kim
|
Yohan Na
|
Kihun Kim
|
Hanhee Cho
|
Geun Lim
|
Mintae Kim
|
Seongik Park
|
Ki Hyun Kim
|
Youngsub Han
|
Byoung-Ki Jeon
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
The emergence of open-source large language models (LLMs) has expanded opportunities for enterprise applications; however, many organizations still lack the infrastructure to deploy and maintain large-scale models. As a result, small LLMs (sLLMs) have become a practical alternative despite inherent performance limitations. While Domain Adaptive Continual Pretraining (DACP) has been explored for domain adaptation, its utility in commercial settings remains under-examined. In this study, we validate the effectiveness of a DACP-based recipe across diverse foundation models and service domains, producing DACP-applied sLLMs (ixi-GEN). Through extensive experiments and real-world evaluations, we demonstrate that ixi-GEN models achieve substantial gains in target-domain performance while preserving general capabilities, offering a cost-efficient and scalable solution for enterprise-level deployment.
Search
Fix author
Co-authors
- Hanhee Cho 1
- Youngsub Han 1
- Byoung-Ki Jeon 1
- Seonwu Kim 1
- Kihun Kim 1
- show all...