Yulin Chen

Other people with similar names: Yulin Chen

Unverified author pages with similar names: Yulin Chen

2026

Learning on Imbalanced Noisy Data via Debiased Sample Selection and LLM-Driven Annotation
Bo Yuan | Yulin Chen | Yin Zhang
Findings of the Association for Computational Linguistics: ACL 2026

Learning with Noisy Labels (LNL) is a challenge where the collected training set can contain incorrect or corrupted labels. Most existing solutions distinguish clean samples from noisy samples and query human experts on noisy samples for denoising. However, these solutions often operate under the unrealistic assumption that the distribution of classes is uniform, overlooking the skewed and imbalanced distributions frequently encountered in real-world scenarios. In this case, we empirically reveal that previous solutions suffer from both selection bias and training bias, leading to distinguish clean samples from noisy samples hardly. In this paper, our work introduces the imbalanced learning with noisy labels (i-LNL) task, which seeks to let the model learn from noisy labels within imbalanced distributions. A new benchmark (ImbaLNL-Bench) comprised of some synthetic and real-world datasets is created to provide a thorough representation of practical use cases. Besides, we propose an innovative collaborative learning framework DeCo for i-LNL tasks. Specifically, we first conduct debiased sample selection, consisting of a robust expert model and a debiased-enhanced threshold strategy, to better separate clean samples from noisy samples, especially for the tail classes. Then we feed selected clean samples to active annotator large language models (LLMs) for re-annotating noisy samples using in-context learning, which can better reduce human effort. Ultimately, we employ distinct loss functions adept at managing subsets with varying degrees of label noise. Extensive experimental results on synthetic and real-world datasets show the effectiveness and superiority of our method.

2025

pdf bib abs

Weed Out, Then Harvest: Dual Low-Rank Adaptation is an Effective Noisy Label Detector for Noise-Robust Learning
Bo Yuan | Yulin Chen | Yin Zhang
Findings of the Association for Computational Linguistics: ACL 2025

Parameter-efficient fine-tuning (PEFT) large language models (LLMs) have shown impressive performance in various downstream tasks. However, in many real-world scenarios, the collected training data inevitably contains noisy labels. To learn from noisy labels, most solutions select samples with small losses for model training. However, the selected samples, in turn, impact the loss computation in the next iteration. An inaccurate initial selection can create a vicious cycle, leading to suboptimal performance. To break this cycle, we propose Delora, a novel framework that decouples the sample selection from model training. For sample selection, Delora establishes a noisy label detector by introducing clean and noisy LoRA. Benefiting from the memory effect, the clean LoRA is encouraged to memorize clean data, while the noisy LoRA is constrained to memorize mislabeled data, which serves as a learnable threshold for selecting clean and noisy samples. For model training, Delora can use carefully selected samples to fine-tune language models seamlessly. Experimental results on synthetic and real-world noisy datasets demonstrate the effectiveness of Delora in noisy label detection and text classification.

Co-authors

Bo Yuan 2
Yin Zhang 2

Venues

Findings2

Fix author