ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining

Seonwu Kim; Yohan Na; Kihun Kim; Hanhee Cho; Geun Lim; Mintae Kim; Seongik Park; Ki Hyun Kim; Youngsub Han; Byoung-Ki Jeon

ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining

Seonwu Kim, Yohan Na, Kihun Kim, Hanhee Cho, Geun Lim, Mintae Kim, Seongik Park, Ki Hyun Kim, Youngsub Han, Byoung-Ki Jeon

Abstract

The emergence of open-source large language models (LLMs) has expanded opportunities for enterprise applications; however, many organizations still lack the infrastructure to deploy and maintain large-scale models. As a result, small LLMs (sLLMs) have become a practical alternative despite inherent performance limitations. While Domain Adaptive Continual Pretraining (DACP) has been explored for domain adaptation, its utility in commercial settings remains under-examined. In this study, we validate the effectiveness of a DACP-based recipe across diverse foundation models and service domains, producing DACP-applied sLLMs (ixi-GEN). Through extensive experiments and real-world evaluations, we demonstrate that ixi-GEN models achieve substantial gains in target-domain performance while preserving general capabilities, offering a cost-efficient and scalable solution for enterprise-level deployment.

Anthology ID:: 2025.emnlp-industry.165
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2387–2404
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.165/
DOI:
Bibkey:
Cite (ACL):: Seonwu Kim, Yohan Na, Kihun Kim, Hanhee Cho, Geun Lim, Mintae Kim, Seongik Park, Ki Hyun Kim, Youngsub Han, and Byoung-Ki Jeon. 2025. ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 2387–2404, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining (Kim et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.165.pdf

PDF Cite Search Fix data