June Hyoung Kwon

2026

Erase Persona, Forget Lore: Benchmarking Multimodal Copyright Unlearning in Large Vision Language Models
June Hyoung Kwon | Jungmin Yun | Youngbin Kim
Proceedings of the Fifteenth Language Resources and Evaluation Conference

Large Vision-Language Models (LVLMs), trained on web-scale data, risk memorizing and regenerating copyrighted visual content like characters and logos, creating significant challenges. Machine unlearning offers a path to mitigate these risks by removing specific content post-training, but evaluating its effectiveness, especially in the complex multimodal setting of LVLMs, remains an open problem. Current evaluation methods often lack robustness or fail to capture the nuances of cross-modal concept erasure. To address this critical gap, we introduce the CoVUBench benchmark, the first framework specifically designed for evaluating copyright content unlearning in LVLMs. CoVUBench utilizes procedurally generated, legally safe synthetic data coupled with systematic visual variations—spanning compositional changes and diverse domain manifestations—to ensure realistic and robust evaluation of unlearning generalization. Our comprehensive, multimodal evaluation protocol assesses both forgetting efficacy from the copyright holder’s perspective and the preservation of general model utility from the deployer’s viewpoint. By rigorously measuring this crucial trade-off, CoVUBench provides a standardized tool to advance the development of responsible and effective unlearning methods for LVLMs.

pdf bib abs

CRiT-QA: Evaluating Multi-hop Reasoning with Counterfactual Chains and Distractor Traps
Jungmin Yun | June Hyoung Kwon | Youngbin Kim
Proceedings of the Fifteenth Language Resources and Evaluation Conference

Evaluating the multi-hop reasoning capabilities of large language models remains a significant challenge. Although current models achieve strong results on existing multi-hop question answering datasets, such performance often masks two critical vulnerabilities: (1) reliance on internal parametric knowledge rather than adherence to the provided context, and (2) exploitation of dataset shortcuts, such as single-document cues or type-matching, that diminish the need for genuine evidence aggregation across multiple documents. We introduce CRiT-QA (Counterfactual Reasoning with Traps), a dataset explicitly designed to address both limitations. To neutralize reliance on memorized knowledge and enforce strict context dependency, CRiT-QA transforms factual reasoning chains with counterfactual entities. Furthermore, it injects multi-anchor distractor chains, plausible but incorrect reasoning paths that diverge at different hops. These traps require models to follow the entire reasoning process rather than exploiting shallow heuristics. Our experiments show that LLMs exhibit substantial performance degradation on CRiT-QA compared to standard datasets, exposing their vulnerability to counterfactual conditions and distractor traps. CRiT-QA thus serves as a rigorous diagnostic tool for evaluating genuine multi-hop reasoning and provides a foundation for developing more reliable, evidence-grounded LLMs.

Co-authors

Youngbin Kim 2
JungMin Yun 2

Venues

LREC2

Fix author