Dongjun Kim
Other people with similar names: Dongjun Kim
Unverified author pages with similar names: Dongjun Kim
2026
MMAC: A Multilingual, Multimodal Alignment Framework for Cultural Grounding Evaluation
Weihua Zheng | Zhengyuan Liu | Tanmoy Chakraborty | Weiwen Xu | Xiaoxue Gao | Bryan Chen Zhengyu Tan | Bowei Zou | Chang Liu | Yujia Hu | Xing Xie | Xiaoyuan Yi | Jing Yao | Chaojun Wang | Long Li | Rui Liu | Huiyao Liu | Koji Inoue | Ryuichi Sumida | Tatsuya Kawahara | Fan Xu | Lingyu Ye | Wei Tian | Dongjun Kim | Jimin Jung | Jaehyung Seo | Nadya Yuki Wangsajaya | Pham Minh Duc | Ojasva Saxena | Palash Nandi | Xiyan Tao | Wiwik Karlina | Tuan Luong | Keertana Arun Vasan | Roy Ka-Wei Lee | Nancy F. Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Weihua Zheng | Zhengyuan Liu | Tanmoy Chakraborty | Weiwen Xu | Xiaoxue Gao | Bryan Chen Zhengyu Tan | Bowei Zou | Chang Liu | Yujia Hu | Xing Xie | Xiaoyuan Yi | Jing Yao | Chaojun Wang | Long Li | Rui Liu | Huiyao Liu | Koji Inoue | Ryuichi Sumida | Tatsuya Kawahara | Fan Xu | Lingyu Ye | Wei Tian | Dongjun Kim | Jimin Jung | Jaehyung Seo | Nadya Yuki Wangsajaya | Pham Minh Duc | Ojasva Saxena | Palash Nandi | Xiyan Tao | Wiwik Karlina | Tuan Luong | Keertana Arun Vasan | Roy Ka-Wei Lee | Nancy F. Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The global deployment of Large Language Models (LLMs) underscores the urgent need to evaluate their cultural alignment. However, assessing genuine "cultural awareness" across modalities (text, vision, speech) and languages remains a significant challenge. To comprehensively investigate this domain, we propose MMAC, a systematic framework that encompasses a tri-modally aligned cultural benchmark creation pipeline and a five-dimensional evaluation protocol to assess cross-country awareness disparities, evaluate cross-lingual and cross-modal consistency, and verify cultural knowledge generalization and grounding validity. Given the prevailing Western cultural bias in current models, we focus on 8 Asian countries as our dataset foundation to more acutely reveal potential cultural deficiencies in LLMs. Our dataset, MMAC-bench, features 27,000 human-curated questions across 10 languages. Crucially, it is the first dataset aligned at the input level across text, image, and speech, enabling direct cross-modal transfer tests. Each question consists of multiple-choice options accompanied by open-ended generated explanations, where 79% require multi-step reasoning grounded in cultural context, moving beyond simple memorization. We probe the causes of modal divergence, offering insights into fostering culturally robust MLLMs.
LangSAE Editing: Improving Multilingual Information Retrieval via Post-hoc Language Identity Removal
Dongjun Kim | Jeongho Yoon | Chanjun Park | Heuiseok Lim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Dongjun Kim | Jeongho Yoon | Chanjun Park | Heuiseok Lim
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Dense retrieval in multilingual settings often searches over mixed-language collections, yet multilingual embeddings encode language identity alongside semantics. This language signal can inflate similarity for same-language pairs and crowd out relevant evidence written in other languages. We propose LANGSAE EDITING, a post-hoc sparse autoencoder trained on pooled embeddings that enables controllable removal of language-identity signal directly in vector space. The method identifies language-associated latent units using cross-language activation statistics, suppresses these units at inference time, and reconstructs embeddings in the original dimensionality, making it compatible with existing vector databases without retraining the base encoder or re-encoding raw text. Experiments across multiple languages show consistent improvements in ranking quality and cross-language coverage, with especially strong gains for script-distinct languages.
Exploring Coding Spot: Understanding Parametric Contributions to LLM Coding Performance
Dongjun Kim | Minhyuk Kim | Yongchan Chun | Chanjun Park | Heuiseok Lim
Findings of the Association for Computational Linguistics: ACL 2026
Dongjun Kim | Minhyuk Kim | Yongchan Chun | Chanjun Park | Heuiseok Lim
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) have demonstrated notable proficiency in both code generation and comprehension across multiple programming languages. However, the mechanisms underlying this proficiency remain underexplored, particularly with respect to whether distinct programming languages are processed independently or within a shared parametric region. Drawing an analogy to the specialized regions of the brain responsible for distinct cognitive functions, we introduce the concept of Coding Spot, a specialized parametric region within LLMs that facilitates coding capabilities. Our findings identify this Coding Spot and show that targeted modifications to this subset significantly affect performance on coding tasks, while largely preserving non-coding functionalities. This compartmentalization mirrors the functional specialization observed in cognitive neuroscience, where specific brain regions are dedicated to distinct tasks, suggesting that LLMs may similarly employ specialized parameter regions for different knowledge domains.
2025
Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks
Dongjun Kim | Gyuho Shim | Yongchan Chun | Minhyuk Kim | Chanjun Park | Heuiseok Lim
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Dongjun Kim | Gyuho Shim | Yongchan Chun | Minhyuk Kim | Chanjun Park | Heuiseok Lim
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large Language Models are commonly judged by their scores on standard benchmarks, yet such scores often overstate real capability since they mask the mix of skills a task actually demands. For example, ARC is assumed to test reasoning, while HellaSwag is designed to evaluate commonsense. However, we lack a systematic way to verify if these benchmarks actually measure these labels. We introduce **BENCHMARK PROFILING**, a diagnostic framework that decomposes benchmark performance into ten cognitively grounded abilities. The method combines gradient-based importance scoring with targeted parameter ablation to compute an Ability Impact Score (AIS) that quantifies how much each ability contributes to a model’s success on a given benchmark. Profiling three instruction-tuned models across ten widely used benchmarks yields four key findings: (i) most benchmarks draw on several abilities rather than one, (ii) datasets with similar labels rely on distinct ability mixtures, (iii) code-generation benchmarks reward broad, multi-skill improvement and thus show only modest gains from narrow domain-specific fine-tuning, and (iv) abilities irrelevant to the task could negatively affect performance. **BENCHMARK PROFILING** therefore explains why performance gains do not always translate into user-perceived competence and offer a transparent tool for benchmark audit and model interpretability.
Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval
Yongchan Chun | Minhyuk Kim | Dongjun Kim | Chanjun Park | Heuiseok Lim
Findings of the Association for Computational Linguistics: ACL 2025
Yongchan Chun | Minhyuk Kim | Dongjun Kim | Chanjun Park | Heuiseok Lim
Findings of the Association for Computational Linguistics: ACL 2025
Automatic Term Extraction (ATE) identifies domain-specific expressions that are crucial for downstream tasks such as machine translation and information retrieval. Although large language models (LLMs) have significantly advanced various NLP tasks, their potential for ATE has scarcely been examined. We propose a retrieval-based prompting strategy that, in the few-shot setting, selects demonstrations according to syntactic rather than semantic similarity. This syntactic retrieval method is domain-agnostic and provides more reliable guidance for capturing term boundaries. We evaluate the approach in both in-domain and cross-domain settings, analyzing how lexical overlap between the query sentence and its retrieved examples affects performance. Experiments on three specialized ATE benchmarks show that syntactic retrieval improves F1-score. These findings highlight the importance of syntactic cues when adapting LLMs to terminology-extraction tasks.
KoLEG: On-the-Fly Korean Legal Knowledge Editing with Continuous Retrieval
Jaehyung Seo | Dahyun Jung | Jaewook Lee | Yongchan Chun | Dongjun Kim | Hwijung Ryu | Donghoon Shin | Heuiseok Lim
Findings of the Association for Computational Linguistics: EMNLP 2025
Jaehyung Seo | Dahyun Jung | Jaewook Lee | Yongchan Chun | Dongjun Kim | Hwijung Ryu | Donghoon Shin | Heuiseok Lim
Findings of the Association for Computational Linguistics: EMNLP 2025
Korean legal knowledge is subject to frequent temporal updates driven by societal needs and government policies. Even minor modifications to legal provisions can have significant consequences, yet continuously retraining large language models (LLMs) to incorporate such updates is resource-intensive and impractical. To address this, we propose KoLEG, an on-the-fly Korean Legal knowledge editing framework enhanced with continuous retrieval. KoLEG employs an Editing-Aware Learning Strategy and a LawEdit Retriever, which together adaptively integrate subtle linguistic nuances and continuous legislative amendments. To support this task, we construct the Korean Legislative Amendment Dataset, explicitly designed for continuous legal knowledge updates with attention to both temporal dynamics and linguistic subtleties. KoLEG outperforms existing locate-then-edit and retrieval-based editing methods, demonstrating superior effectiveness in legal knowledge editing while preserving linguistic capabilities. Furthermore, KoLEG maintains robust performance in sequential editing, improves performance on precedent application tasks, and is qualitatively validated by legal experts.
Search
Fix author
Co-authors
- Heui-Seok Lim 5
- Yongchan Chun 4
- Chanjun Park 4
- Minhyuk Kim 3
- Jaehyung Seo 2
- Tanmoy Chakraborty 1
- Nancy Chen 1
- Pham Minh Duc 1
- Xiaoxue Gao 1
- Yujia Hu 1
- Koji Inoue 1
- Jimin Jung 1
- Dahyun Jung 1
- Wiwik Karlina 1
- Tatsuya Kawahara 1
- Roy Ka-Wei Lee 1
- Jaewook Lee 1
- Long Li 1
- Zhengyuan Liu 1
- Chang Liu 1
- Rui Liu 1
- Huiyao Liu 1
- Tuan Luong 1
- Palash Nandi 1
- Hwijung Ryu 1
- Ojasva Saxena 1
- Gyuho Shim 1
- Donghoon Shin 1
- Ryuichi Sumida 1
- Bryan Chen Zhengyu Tan 1
- Xiyan Tao 1
- Wei Tian 1
- Keertana Arun Vasan 1
- Chaojun Wang 1
- Nadya Yuki Wangsajaya 1
- Xing Xie 1
- Weiwen Xu 1
- Fan Xu (徐凡) 1
- Jing Yao 1
- Lingyu Ye 1
- Xiaoyuan Yi 1
- Jeongho Yoon 1
- Weihua Zheng 1
- Bowei Zou (邹博伟) 1