Yijia Fan
2026
Reinforcement Learning for Diffusion LLMs via Energy-Based Gibbs Alignment
Yijia Fan | Jing Yang | Mingyu Liu | Kaitong Cai | Jian Wang | Keze Wang | Jusheng Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yijia Fan | Jing Yang | Mingyu Liu | Kaitong Cai | Jian Wang | Keze Wang | Jusheng Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Diffusion Large Language Models (dLLMs) have emerged as a promising non-autoregressive paradigm for text generation, offering parallel decoding and bidirectional context modeling. However, aligning dLLMs with reinforcement learning (RL) remains a significant challenge, as the marginal likelihood of sequences in masked diffusion is typically intractable, rendering standard policy gradient methods unstable or computationally prohibitive. In this work, we propose **Diffusion-Gibbs Alignment (DGA)**, a novel variational framework that reformulates RL for dLLMs as a distribution matching problem. DGA bypasses the explicit computation of log-probabilities by leveraging a learned energy function to model the relative quality of samples. The optimization is decoupled into two stable steps: (1) contrastive energy ranking to capture global reward structures, and (2) weighted diffusion alignment to update the policy via importance sampling. Empirically, DGA establishes a new state-of-the-art across logical reasoning (Sudoku, Countdown), mathematical reasoning (GSM8K, Math500), and code generation (HumanEval, MBPP) benchmarks. DGA offers a novel variational perspective for dLLM alignment, achieving better performance while simultaneously enhancing training speed and memory efficiency.
Nash-Pruned CredMAS: Dynamic Panel Pruning for VLM-MAS using Nash-based Selection and Doubly-Robust Credits
Yijia Fan | Mingyu Liu | Jing Yang | Jian Wang | Keze Wang | Jusheng Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Yijia Fan | Mingyu Liu | Jing Yang | Jian Wang | Keze Wang | Jusheng Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Multi-round Vision-Language Model (VLM) Multi-Agent Systems (MAS) offer powerful reasoning capabilities but suffer from prohibitive costs due to static panel designs, where all N agents communicate at every T round. This approach is fundamentally inefficient, as it ignores the context-dependent and diminishing marginal utility of specific agents. To address this, we propose Nash-CredMAS, an economic framework that transforms agent selection into a dynamic resource allocation game. Unlike heuristic routing or one-time pruning, our method operates in two phases: (1) Offline Causal Value Learning, where we employ a doubly-robust (AIPW) estimator to train a context-aware value function from biased interaction logs, effectively learning the true marginal contribution of agents; and (2) Online Dynamic Auctions, where agents bid for communication slots based on their predicted utility. We formulate the inference-time selection as a submodular maximization problem under budget constraints, theoretically guaranteeing a (1 - 1/e)-approximation of the optimal coalition via a greedy strategy. Empirically, Nash-CredMAS achieves state-of-the-art results on challenging benchmarks, including MMMU and V*-Bench, while reducing token consumption by over 25% compared to static baselines. The system naturally converges to an economic equilibrium where agents actively remain silent when their marginal value does not justify the cost.
2025
CCG: Rare-Label Prediction via Neural SEM–Driven Causal Game
Yijia Fan | Jusheng Zhang | Kaitong Cai | Jing Yang | Keze Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Yijia Fan | Jusheng Zhang | Kaitong Cai | Jing Yang | Keze Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Multi-label classification (MLC) faces persistent challenges from label imbalance, spurious correlations, and distribution shifts, especially in rare label prediction. We propose the Causal Cooperative Game (CCG) framework, which models MLC as a multi-player cooperative process. CCG integrates explicit causal discovery via Neural Structural Equation Models, a counterfactual curiosity reward to guide robust feature learning, and a causal invariance loss to ensure generalization across environments, along with targeted rare label enhancement. Extensive experiments on benchmark datasets demonstrate that CCG significantly improves rare label prediction and overall robustness compared to strong baselines. Ablation and qualitative analyses further validate the effectiveness and interpretability of each component. Our work highlights the promise of combining causal inference and cooperative game theory for more robust and interpretable multi-label learning.
Towards More Efficient Post-training via Fourier Domain Adapter Framework
Yijia Fan | Jusheng Zhang | Keze Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Yijia Fan | Jusheng Zhang | Keze Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
We introduce Fourier Domain Adapter (FDA), a novel and parameter-efficient framework for fine-tuning large-scale pre-trained language models. FDA reparameterizes the core projection operation of the adapter module directly in the Fourier domain. This involves transforming the input features via discrete Fourier transform (DFT), applying sparse learnable complex modulations in frequency space, and then back-transforming via inverse DFT, supplemented by highly compact auxiliary linear layers. This approach significantly reduces the number of trainable parameters while enhancing the model’s ability to capture salient frequency-based semantic information. Comprehensive experiments on GLUE, E2E NLG, and instruction tuning benchmarks show that our FDA consistently outperforms existing parameter-efficient fine-tuning (PEFT) methods. It can achieve better performance with nearly 100x fewer training parameters than traditional fine-tuning methods such as LoRA and AdapterH. Our results demonstrate that FDA is a robust and efficient solution for developing efficient and powerful language models.
DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off
Jusheng Zhang | Yijia Fan | Kaitong Cai | Zimeng Huang | Xiaofei Sun | Jian Wang | Chengpei Tang | Keze Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Jusheng Zhang | Yijia Fan | Kaitong Cai | Zimeng Huang | Xiaofei Sun | Jian Wang | Chengpei Tang | Keze Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
This paper introduces DrDiff, a novel framework for long-text generation that overcomes the efficiency-quality trade-off through three core technologies. First, we design a dynamic expert scheduling mechanism that intelligently allocates computational resources during the diffusion process based on text complexity, enabling more efficient handling of text generation tasks of varying difficulty. Second, we introduce a Hierarchical Sparse Attention (HSA) mechanism that adaptively adjusts attention patterns according to a variety of input lengths, reducing computational complexity from O(n2) to O(n) while maintaining model performance. Finally, we propose a Semantic Anchor States (SAS) module that combines with DPM-solver++ to reduce diffusion steps, significantly improving generation speed. Comprehensive experiments on various long-text generation benchmarks demonstrate the superiority of our DrDiff over the existing SOTA methods.
OSC: Cognitive Orchestration through Dynamic Knowledge Alignment in Multi-Agent LLM Collaboration
Jusheng Zhang | Yijia Fan | Kaitong Cai | Xiaofei Sun | Keze Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Jusheng Zhang | Yijia Fan | Kaitong Cai | Xiaofei Sun | Keze Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
This paper introduces OSC (Orchestrating Cognitive Synergy), a knowledge-aware adaptive collaboration framework designed to enhance cognitive synergy in multi-agent systems with large language models. While prior work has advanced agent selection and result aggregation, efficient linguistic interactions for deep collaboration among expert agents remain a critical bottleneck. OSC addresses this gap as a pivotal intermediate layer between selection and aggregation, introducing Collaborator Knowledge Models (CKM) to enable each agent to dynamically perceive its collaborators’ cognitive states. Through real-time cognitive gap analysis, agents adaptively adjust communication behaviors, including content focus, detail level, and expression style, using learned strategies. Experiments on complex reasoning and problem-solving benchmarks demonstrate that OSC significantly improves task performance and communication efficiency, transforming “parallel-working individuals” into a “deeply collaborative cognitive team”.