Sangyeon Cho
2026
Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation
Kyomin Hwang | Hyeonjin Kim | Sangyeon Cho | Nojun Kwak
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Kyomin Hwang | Hyeonjin Kim | Sangyeon Cho | Nojun Kwak
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While LLMs are increasingly used in commercial services, they pose privacy risks such as leakage of sensitive personally identifiable information (PII). For LLMs trained on multilingual corpora, Multilingual Machine Unlearning (MMU) aims to remove information across multiple languages. However, prior MMU evaluations fail to capture such cross-linguistic distribution of information, being largely limited to direct extensions of per-language evaluation protocols. To this end, we propose two metrics to evaluate the information spread across languages: the Knowledge Separability Score (KSS) and the Knowledge Persistence Score (KPS). KSS measures the overall unlearning quality across multiple languages, while KPS more specifically aims to assess consistent removal of information among different language pairs. We evaluated various unlearning methods in the multilingual setting with these metrics and conducted comprehensive analyses. Through our investigation, we provide insights into unique phenomena exclusive to MMU and offer a new perspective on MMU evaluation.
2025
Learning to See through Sound: From VggCaps to Multi2Cap for Richer Automated Audio Captioning
Sangyeon Cho | Mingi Kim | Jinkwon Hwang | Jaehoon Go | Minuk Ma | Sunjae Yoon | Junyeong Kim
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Sangyeon Cho | Mingi Kim | Jinkwon Hwang | Jaehoon Go | Minuk Ma | Sunjae Yoon | Junyeong Kim
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Automated Audio Captioning (AAC) aims to generate natural language descriptions of audio content, enabling machines to interpret and communicate complex acoustic scenes. However, current AAC datasets often suffer from short and simplistic captions, limiting model expressiveness and semantic depth. To address this, we introduce **VggCaps**, a new multi-modal dataset that pairs audio with corresponding video and leverages large language models (LLMs) to generate rich, descriptive captions. VggCaps significantly outperforms existing benchmarks in caption length, lexical diversity, and human-rated quality. Furthermore, we propose **Multi2Cap**, a novel AAC framework that learns audio-visual representations through a AV-grounding module during pre-training and reconstructs visual semantics using audio alone at inference. This enables visually grounded captioning in audio-only scenarios. Experimental results on Clotho and AudioCaps demonstrate that Multi2Cap achieves state-of-the-art performance across multiple metrics, validating the effectiveness of cross-modal supervision and LLM-based generation in advancing AAC.