Yijiang River Dong

2025

pdf bib abs
UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models
Yijiang River Dong | Hongzhou Lin | Mikhail Belkin | Ramon Huerta | Ivan Vulić
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Mitigating the retention of sensitive or private information in large language models is essential for enhancing privacy and safety. Existing unlearning methods, like Gradient Ascent and Negative Preference Optimization, directly tune models to remove unwanted information. However, these methods often become unstable because they fine-tune by maximizing loss, which is the opposite of traditional loss minimization in learning. This reversal creates instability, especially on larger datasets, as the model struggles to balance unlearning with maintaining language capacity, leading to over-unlearning. In this paper, we introduce UnDIAL (Unlearning via Self-Distillation on Adjusted Logits), a novel and robust unlearning method. Our approach leverages self-distillation to adjust logits and selectively reduce the influence of targeted tokens. This technique ensures smooth convergence and avoids catastrophic forgetting, even in challenging unlearning tasks with large datasets and sequential unlearning requests. Extensive experiments show that UnDIAL is the first direct tuning method to achieve both robustness in unlearning and scalability, while maintaining stable training dynamics and resilience to hyperparameter tuning.

2024

pdf bib abs
Can LLM be a Personalized Judge?
Yijiang River Dong | Tiancheng Hu | Nigel Collier
Findings of the Association for Computational Linguistics: EMNLP 2024

As large language models (LLMs) gain widespread adoption, ensuring they cater to diverse user needs has become increasingly important. While many researchers have studied LLM personalization and role-playing, they primarily use LLM-as-a-Judge for evaluation without thoroughly examining its validity. This paper investigates the reliability of LLM-as-a-Personalized-Judge—asking LLMs to judge user preferences based on persona. Our results suggest that LLM-as-a-Personalized-Judge is less reliable for personalization than previously believed, showing low agreement with human ground truth. We observed that the personas provided to the LLM often have limited predictive power for the tasks, leading us to introduce verbal uncertainty estimation. We find that powerful LLMs are aware of the certainty of their prediction and can achieve high agreement with ground truth on high-certainty samples, indicating a promising approach for building reliable and scalable proxies for evaluating LLM personalization. Our human annotation reveals that third-person crowd worker evaluations of personalized preferences are even worse than LLM predictions, highlighting the challenges of evaluating LLM personalization.

2023

pdf bib abs
CoRRPUS: Code-based Structured Prompting for Neurosymbolic Story Understanding
Yijiang River Dong | Lara J. Martin | Chris Callison-Burch
Findings of the Association for Computational Linguistics: ACL 2023

Story generation and understanding—as with all NLG/NLU tasks—has seen a surge in neurosymbolic work. Researchers have recognized that, while large language models (LLMs) have tremendous utility, they can be augmented with symbolic means to be even better and to make up for many flaws that neural networks have. However, symbolic methods are extremely costly in terms of the amount of time and expertise needed to create them. In this work, we capitalize on state-of-the-art Code-LLMs, such as Codex, to bootstrap the use of symbolic methods for tracking the state of stories and aiding in story understanding. We show that our CoRRPUS system and abstracted prompting procedures can beat current state-of-the-art structured LLM techniques on pre-existing story understanding tasks (bAbI Task 2 and Re³) with minimal hand engineering. This work highlights the usefulness of code-based symbolic representations for enabling LLMs to better perform story reasoning tasks.

Co-authors

Hongzhou Lin 1

Lara J. Martin 1

Ivan Vulić 1

Venues

findings2
naacl1

Fix data