Chenchen Jing
2026
Efficient Self-Evaluation for Diffusion Language Models via Sequence Regeneration
Linhao Zhong | Linyu Wu | Wen Wang | Yuling Xi | Chenchen Jing | Jiaheng Zhang | Hao Chen | Chunhua Shen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Linhao Zhong | Linyu Wu | Wen Wang | Yuling Xi | Chenchen Jing | Jiaheng Zhang | Hao Chen | Chunhua Shen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Diffusion large language models (dLLMs) have recently attracted significant attention for their ability to enhance diversity, controllability, and parallelism. However, their non-sequential, bidirectionally masked generation makes quality assessment difficult, underscoring the need for effective self-evaluation. In this work, we propose DiSE, a simple yet effective self-evaluation confidence quantification method for dLLMs. DiSE quantifies confidence by computing the probability of regenerating the tokens in the entire generated sequence, given the full context. This method enables more efficient and reliable quality assessment by leveraging token regeneration probabilities, facilitating both likelihood estimation and robust uncertainty quantification. Building upon DiSE, we further introduce a flexible-length generation framework, which adaptively controls the sequence length based on the model’s self-assessment of its own output. We analyze and validate the feasibility of DiSE from the perspective of dLLM generalization, and empirically demonstrate that DiSE is positively correlated with both semantic coherence and answer accuracy. Extensive experiments on likelihood evaluation, uncertainty quantification, and flexible-length generation further confirm the effectiveness of the proposed DiSE.
Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models
Linhao Zhong | Linyu Wu | Bozhen Fang | Tianjian Feng | Chenchen Jing | Wen Wang | Jiaheng Zhang | Hao Chen | Chunhua Shen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Linhao Zhong | Linyu Wu | Bozhen Fang | Tianjian Feng | Chenchen Jing | Wen Wang | Jiaheng Zhang | Hao Chen | Chunhua Shen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Diffusion Language Models (DLMs) offer a promising alternative for language modeling by enabling parallel decoding through iterative refinement. However, most DLMs rely on hard binary masking and discrete token assignments, which hinder the revision of early decisions and underutilize intermediate probabilistic representations. In this paper, we propose EvoToken-DLM, a novel diffusion-based language modeling approach that replaces hard binary masks with evolving soft token distributions. EvoToken-DLM enables a progressive transition from masked states to discrete outputs, supporting revisable decoding. To effectively support this evolution, we introduce continuous trajectory supervision, which aligns training objectives with iterative probabilistic updates. Extensive experiments across multiple benchmarks show that EvoToken-DLM consistently achieves superior performance, outperforming strong diffusion-based and masked DLM baselines. Our code is available at https://github.com/aim-uofa/EvoTokenDLM.
2025
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Jianyu Liu | Hangyu Guo | Ranjie Duan | Xingyuan Bu | Yancheng He | Shilong Li | Hui Huang | Jiaheng Liu | Yucheng Wang | Chenchen Jing | Xingwei Qu | Xiao Zhang | Pei Wang | Yanan Wu | Jihao Gu | Yangguang Li | Jianke Zhu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Jianyu Liu | Hangyu Guo | Ranjie Duan | Xingyuan Bu | Yancheng He | Shilong Li | Hui Huang | Jiaheng Liu | Yucheng Wang | Chenchen Jing | Xingwei Qu | Xiao Zhang | Pei Wang | Yanan Wu | Jihao Gu | Yangguang Li | Jianke Zhu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Multimodal Large Language Models (MLLMs) pose unique safety challenges due to their integration of visual and textual data, thereby introducing new dimensions of potential attacks and complex risk combinations. In this paper, we begin with a detailed analysis aimed at disentangling risks through step-by-step reasoning within multimodal inputs. We find that systematic multimodal risk disentanglement substantially enhances the risk awareness of MLLMs. Via leveraging the strong discriminative abilities of multimodal risk disentanglement, we further introduce DREAM ( Disentangling Risks to Enhance Safety Alignment in MLLMs), a novel approach that enhances safety alignment in MLLMs through supervised fine-tuning and iterative Reinforcement Learning from AI Feedback (RLAIF). Experimental results show that DREAM significantly boosts safety during both inference and training phases without compromising performance on normal tasks (namely oversafety), achieving a 16.17% improvement in the SIUO safe&effective score compared to GPT-4V.
2024
In-Context Compositional Generalization for Large Vision-Language Models
Chuanhao Li | Chenchen Jing | Zhen Li | Mingliang Zhai | Yuwei Wu | Yunde Jia
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Chuanhao Li | Chenchen Jing | Zhen Li | Mingliang Zhai | Yuwei Wu | Yunde Jia
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Recent work has revealed that in-context learning for large language models exhibits compositional generalization capacity, which can be enhanced by selecting in-context demonstrations similar to test cases to provide contextual information. However, how to exhibit in-context compositional generalization (ICCG) of large vision-language models (LVLMs) is non-trival. Due to the inherent asymmetry between visual and linguistic modalities, ICCG in LVLMs faces an inevitable challenge—redundant information on the visual modality. The redundant information affects in-context learning from two aspects: (1) Similarity calculation may be dominated by redundant information, resulting in sub-optimal demonstration selection. (2) Redundant information in in-context demonstrations brings misleading contextual information to in-context learning. To alleviate these problems, we propose a demonstration selection method to achieve ICCG for LVLMs, by considering two key factors of demonstrations: content and structure, from a multimodal perspective. Specifically, we design a diversity-coverage-based matching score to select demonstrations with maximum coverage, and avoid selecting demonstrations with redundant information via their content redundancy and structural complexity. We build a GQA-ICCG dataset to simulate the ICCG setting, and conduct experiments on GQA-ICCG and the VQA v2 dataset. Experimental results demonstrate the effectiveness of our method.
Search
Fix author
Co-authors
- Hao Chen 2
- Chunhua Shen 2
- Wen Wang 2
- Linyu Wu 2
- Jiaheng Zhang 2
- Linhao Zhong 2
- Xingyuan Bu 1
- Ranjie Duan 1
- Bozhen Fang 1
- Tianjian Feng 1
- Jihao Gu 1
- Hangyu Guo 1
- Yancheng He 1
- Hui Huang 1
- Yunde Jia 1
- Shilong Li 1
- Yangguang Li 1
- Chuanhao Li 1
- Zhen Li 1
- Jianyu Liu 1
- Jiaheng Liu 1
- Xingwei Qu 1
- Yucheng Wang 1
- Pei Wang 1
- Yanan Wu 1
- Yuwei Wu 1
- Yuling Xi 1
- Mingliang Zhai 1
- Xiao Zhang 1
- Jianke Zhu 1