Yongwei Wang
2026
PretrainRL: Alleviating Factuality Hallucination of Large Language Models at the Beginning
Langming Liu | Kangtao Lv | Haibin Chen | Weidong Zhang | Yejing Wang | Shilei Liu | Xin Tong | Yujin Yuan | Yongwei Wang | Wenbo Su | Bo Zheng
Findings of the Association for Computational Linguistics: ACL 2026
Langming Liu | Kangtao Lv | Haibin Chen | Weidong Zhang | Yejing Wang | Shilei Liu | Xin Tong | Yujin Yuan | Yongwei Wang | Wenbo Su | Bo Zheng
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs), despite their powerful capabilities, suffer from factual hallucinations where they generate verifiable falsehoods. We identify a root of this issue: the imbalanced data distribution in the pretraining corpus, which leads to a state of "low-probability truth" and "high-probability falsehood". Recent approaches, such as teaching models to say "I don’t know" or post-hoc knowledge editing, either evade the problem or face catastrophic forgetting. To address this issue from its root, we propose PretrainRL, a novel framework that integrates reinforcement learning into the pretraining phase to consolidate factual knowledge. The core principle of PretrainRL is "debiasing then learning." It actively reshapes the model’s probability distribution by down-weighting high-probability falsehoods, thereby making "room" for low-probability truths to be learned effectively. To enable this, we design an efficient negative sampling strategy to discover these high-probability falsehoods and introduce novel metrics to evaluate the model’s probabilistic state concerning factual knowledge. Extensive experiments on three public benchmarks demonstrate that PretrainRL significantly alleviates factual hallucinations and outperforms state-of-the-art methods.
2025
How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models
Kangtao Lv | Haibin Chen | Yujin Yuan | Langming Liu | Shilei Liu | Yongwei Wang | Wenbo Su | Bo Zheng
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Kangtao Lv | Haibin Chen | Yujin Yuan | Langming Liu | Shilei Liu | Yongwei Wang | Wenbo Su | Bo Zheng
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) have attracted significant attention due to their impressive general capabilities across diverse downstream tasks. However, without domain-specific optimization, they often underperform on specialized knowledge benchmarks and even produce hallucination. Recent studies show that strategically infusing domain knowledge during pretraining can substantially improve downstream performance. A critical challenge lies in balancing this infusion trade-off: injecting too little domain-specific data yields insufficient specialization, whereas excessive infusion triggers catastrophic forgetting of previously acquired knowledge. In this work, we focus on the phenomenon of memory collapse induced by over-infusion. Through systematic experiments, we make two key observations, i.e. 1) Critical collapse point: each model exhibits a threshold beyond which its knowledge retention capabilities sharply degrade. 2) Scale correlation: these collapse points scale consistently with the model’s size. Building on these insights, we propose a knowledge infusion scaling law that predicts the optimal amount of domain knowledge to inject into large LLMs by analyzing their smaller counterparts. Extensive experiments across different model sizes and pertaining token budgets validate both the effectiveness and generalizability of our scaling law.