Yijiang River Dong
2026
Privacy-R1: Privacy-Aware Multi-LLM Agent Collaboration via Reinforcement Learning
Zheng Hui | Yijiang River Dong | Sanhanat Sivapiromrat | Ehsan Shareghi | Nigel Collier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zheng Hui | Yijiang River Dong | Sanhanat Sivapiromrat | Ehsan Shareghi | Nigel Collier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
When users submit queries to Large Language Models (LLMs), their prompts can often contain sensitive data, forcing a difficult choice: Send the query to a powerful proprietary LLM providers to achieving state-of-the-art performance and risk data exposure, or relying on smaller, local models guarantees data privacy but often results in a degradation of task performance. Prior approaches have relied on static pipelines that use LLM rewriting, which shatters linguistic coherence and indiscriminately removes privacy-sensitive information, including task-critical content. We reformulate this challenge (Privacy-Conscious Delegation) as a sequential decision-making problem and introduce a novel reinforcement learning (RL) framework called Privacy-R1 to solve it. Our framework trains an agent to dynamically route text chunks, learning a policy that optimally balances the trade-off between privacy leakage and task performance. It implicitly distinguishes between replaceable Personally Identifiable Information (PII) (which it shields locally) and task-critical PII (which it strategically sends to the remote model for maximal utility). To validate our approach in complex scenarios, we also introduce a new medical dataset with high PII density. Our framework achieves a new state-of-the-art on the privacy-utility frontier, demonstrating the necessity of learned, adaptive policies for deploying LLMs in sensitive environments. Dataset can be found at: https://github.com/zackhuiiiii/Privacy-R1.
Value of Information: A Framework for Human–Agent Communication
Yijiang River Dong | Tiancheng Hu | Zheng Hui | Caiqi Zhang | Ivan Vulić | Andreea Bobu | Nigel Collier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yijiang River Dong | Tiancheng Hu | Zheng Hui | Caiqi Zhang | Ivan Vulić | Andreea Bobu | Nigel Collier
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Model (LLM) agents deployed for real-world tasks face a fundamental dilemma: user requests are underspecified, yet agents must decide whether to act on incomplete information or interrupt users for clarification. Existing approaches either rely on brittle confidence thresholds that require task-specific tuning, or fail to account for the varying stakes of different decisions. We introduce a decision-theoretic framework that resolves this trade-off through the Value of Information (VoI), enabling agents to dynamically weigh the expected utility gain from asking questions against the cognitive cost imposed on users. Our inference-time method requires no hyperparameter tuning and adapts seamlessly across contexts—from casual games to medical diagnosis. Experiments across four diverse domains (20 Questions, medical diagnosis, flight booking, and e-commerce) show that VoI consistently matches or exceeds the best manually-tuned baselines, achieving up to 1.36 utility points higher in high-cost settings. This work provides a parameter-free framework for adaptive agent communication that explicitly balances task risk, query ambiguity, and user effort.
2025
UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models
Yijiang River Dong | Hongzhou Lin | Mikhail Belkin | Ramon Huerta | Ivan Vulić
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Yijiang River Dong | Hongzhou Lin | Mikhail Belkin | Ramon Huerta | Ivan Vulić
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Mitigating the retention of sensitive or private information in large language models is essential for enhancing privacy and safety. Existing unlearning methods, like Gradient Ascent and Negative Preference Optimization, directly tune models to remove unwanted information. However, these methods often become unstable because they fine-tune by maximizing loss, which is the opposite of traditional loss minimization in learning. This reversal creates instability, especially on larger datasets, as the model struggles to balance unlearning with maintaining language capacity, leading to over-unlearning. In this paper, we introduce UnDIAL (Unlearning via Self-Distillation on Adjusted Logits), a novel and robust unlearning method. Our approach leverages self-distillation to adjust logits and selectively reduce the influence of targeted tokens. This technique ensures smooth convergence and avoids catastrophic forgetting, even in challenging unlearning tasks with large datasets and sequential unlearning requests. Extensive experiments show that UnDIAL is the first direct tuning method to achieve both robustness in unlearning and scalability, while maintaining stable training dynamics and resilience to hyperparameter tuning.
When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning
Yijiang River Dong | Tiancheng Hu | Yinhong Liu | Ahmet Üstün | Nigel Collier
Findings of the Association for Computational Linguistics: EMNLP 2025
Yijiang River Dong | Tiancheng Hu | Yinhong Liu | Ahmet Üstün | Nigel Collier
Findings of the Association for Computational Linguistics: EMNLP 2025
While Reinforcement Learning from Human Feedback (RLHF) is widely used to align Large Language Models (LLMs) with human preferences, it typically assumes homogeneous preferences across users, overlooking diverse human values and minority viewpoints.Although personalized preference learning addresses this by tailoring separate preferences for individual users, the field lacks standardized methods to assess its effectiveness. We present a multi-faceted evaluation framework that measures not only performance but also fairness, unintended effects, and adaptability across varying levels of preference divergence. Through extensive experiments comparing eight personalization methods across three preference datasets, we demonstrate that performance differences between methods could reach 36% when users strongly disagree, and personalization can introduce up to 20% safety misalignment. These findings highlight the critical need for holistic evaluation approaches to advance the development of more effective and inclusive preference learning systems.
2024
Can LLM be a Personalized Judge?
Yijiang River Dong | Tiancheng Hu | Nigel Collier
Findings of the Association for Computational Linguistics: EMNLP 2024
Yijiang River Dong | Tiancheng Hu | Nigel Collier
Findings of the Association for Computational Linguistics: EMNLP 2024
As large language models (LLMs) gain widespread adoption, ensuring they cater to diverse user needs has become increasingly important. While many researchers have studied LLM personalization and role-playing, they primarily use LLM-as-a-Judge for evaluation without thoroughly examining its validity. This paper investigates the reliability of LLM-as-a-Personalized-Judge—asking LLMs to judge user preferences based on persona. Our results suggest that LLM-as-a-Personalized-Judge is less reliable for personalization than previously believed, showing low agreement with human ground truth. We observed that the personas provided to the LLM often have limited predictive power for the tasks, leading us to introduce verbal uncertainty estimation. We find that powerful LLMs are aware of the certainty of their prediction and can achieve high agreement with ground truth on high-certainty samples, indicating a promising approach for building reliable and scalable proxies for evaluating LLM personalization. Our human annotation reveals that third-person crowd worker evaluations of personalized preferences are even worse than LLM predictions, highlighting the challenges of evaluating LLM personalization.
2023
CoRRPUS: Code-based Structured Prompting for Neurosymbolic Story Understanding
Yijiang River Dong | Lara J. Martin | Chris Callison-Burch
Findings of the Association for Computational Linguistics: ACL 2023
Yijiang River Dong | Lara J. Martin | Chris Callison-Burch
Findings of the Association for Computational Linguistics: ACL 2023
Story generation and understanding—as with all NLG/NLU tasks—has seen a surge in neurosymbolic work. Researchers have recognized that, while large language models (LLMs) have tremendous utility, they can be augmented with symbolic means to be even better and to make up for many flaws that neural networks have. However, symbolic methods are extremely costly in terms of the amount of time and expertise needed to create them. In this work, we capitalize on state-of-the-art Code-LLMs, such as Codex, to bootstrap the use of symbolic methods for tracking the state of stories and aiding in story understanding. We show that our CoRRPUS system and abstracted prompting procedures can beat current state-of-the-art structured LLM techniques on pre-existing story understanding tasks (bAbI Task 2 and Re³) with minimal hand engineering. This work highlights the usefulness of code-based symbolic representations for enabling LLMs to better perform story reasoning tasks.