Taemin Lee
2026
Improving Korean-English Cross-Lingual Retrieval: A Data-Centric Study of Language Composition and Model Merging
Youngjoon Jang | Junyoung Son | Taemin Lee | Seongtae Hong | Hyeonseok Moon | Seungyoon Lee | Andrew Matteson | Heuiseok Lim
Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026)
Youngjoon Jang | Junyoung Son | Taemin Lee | Seongtae Hong | Hyeonseok Moon | Seungyoon Lee | Andrew Matteson | Heuiseok Lim
Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026)
With the increasing utilization of multilingual text information, Cross-Lingual Information Retrieval (CLIR) has become a crucial research area. However, the impact of training data composition on CLIR and Mono-Lingual Information Retrieval (Mono-IR) performance remains underexplored. To investigate this data-centric aspect, we construct linguistically parallel Korean-English datasets and train multilingual retrieval models with various language combinations. Our experiments reveal that the language composition of training data significantly influence IR performance, exhibiting important inter-lingual correlations: Using specific language pairs improves CLIR performance, while declines Mono-IR performance. Our work demonstrates that simple weight-averaged model merging can effectively mitigate this trade-off, achieving strong CLIR results while preserving Mono-IR capabilities. Our findings highlight the effects of linguistic configuration of training data on both CLIR and Mono-IR, and present model merging as a viable strategy to optimize performance across these tasks.
2024
Intelligent Predictive Maintenance RAG framework for Power Plants: Enhancing QA with StyleDFS and Domain Specific Instruction Tuning
Seongtae Hong | Joong Min Shin | Jaehyung Seo | Taemin Lee | Jeongbae Park | Cho Man Young | Byeongho Choi | Heuiseok Lim
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Seongtae Hong | Joong Min Shin | Jaehyung Seo | Taemin Lee | Jeongbae Park | Cho Man Young | Byeongho Choi | Heuiseok Lim
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track