James Kwok
2026
Diffusion with Truncated Blocks: Fast and High-Quality Text Generation using Truncated Block Generation
Yuyan Zhou | Weiyu Chen | James Kwok
Findings of the Association for Computational Linguistics: ACL 2026
Yuyan Zhou | Weiyu Chen | James Kwok
Findings of the Association for Computational Linguistics: ACL 2026
Diffusion-based Large Language Models (dLLMs) are emerging as a powerful alternative to traditional autoregressive models. These models learn to generate text by iteratively denoising masked sequences. In this work, we identify a critical problem in dLLMs: the model’s attention is wastefully expended on uninformative mask tokens, diluting its focus on meaningful context. We term this phenomenon “attention dilution”. We further show that this artifact is amplified by token-level noising, whereas models employing sequence-level noise exhibit a reduced effect. To resolve this problem, we introduce Truncated Block Generation, a novel sampling algorithm that not only mitigates attention dilution but also enables faster inference and flexible-length sequence generation. Extensive experiments validate our analysis and demonstrate the marked effectiveness of our proposed method in enhancing both the performance and efficiency of dLLMs.
Sculpting the Vector Space: Towards Efficient Multi-Vector Visual Document Retrieval via Prune-then-Merge Framework
Yibo Yan | Mingdong Ou | Yi Cao | Xin Zou | Jiahao Huo | Shuliang Liu | James Kwok | Xuming Hu
Findings of the Association for Computational Linguistics: ACL 2026
Yibo Yan | Mingdong Ou | Yi Cao | Xin Zou | Jiahao Huo | Shuliang Liu | James Kwok | Xuming Hu
Findings of the Association for Computational Linguistics: ACL 2026
Visual Document Retrieval (VDR), which aims to retrieve relevant pages within vast corpora of visually-rich documents, is of significance in current multimodal retrieval applications. The state-of-the-art multi-vector paradigm excels in performance but suffers from prohibitive overhead, a problem that current efficiency methods like pruning and merging address imperfectly, creating a difficult trade-off between compression rate and feature fidelity. To overcome this dilemma, we introduce **Prune-then-Merge**, a novel two-stage framework that synergizes these complementary approaches. Our method first employs an adaptive pruning stage to filter out low-information patches, creating a refined, high-signal set of embeddings. Subsequently, a hierarchical merging stage compresses this pre-filtered set, effectively summarizing semantic content without the noise-induced feature dilution seen in single-stage methods. **Extensive experiments on 29 VDR datasets demonstrate that our framework consistently outperforms existing methods, significantly extending the near-lossless compression range and providing robust performance at high compression ratios.**
2025
Nested-Refinement Metamorphosis: Reflective Evolution for Efficient Optimization of Networking Problems
Shuhan Guo | Nan Yin | James Kwok | Quanming Yao
Findings of the Association for Computational Linguistics: ACL 2025
Shuhan Guo | Nan Yin | James Kwok | Quanming Yao
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Models (LLMs) excel in network algorithm design but suffer from inefficient iterative coding and high computational costs. Drawing inspiration from butterfly metamorphosis—where structured developmental phases (Phase I: larval nutrient accumulation → Phase II: pupal transformation) enable adaptive evolution—we propose Nested-Refinement Metamorphosis (NeRM). Building on this principle, we introduce Metamorphosis on Prompts (MoP) to iteratively refine task descriptions (e.g. latency / bandwidth constraints) and Metamorphosis on Algorithms (MoA) to generate more effective solutions (e.g. appropriate network processing architecture). Their nested refinement ensures task-algorithm alignment, systematically improving both task descriptions and algorithmic solutions for more efficient algorithm design. To further enhance efficiency, we incorporate predictor-assisted code evaluation, mimicking natural selection by filtering out weak candidates early and reducing computational costs. Experimental results on TSP (routing), MKP (resource allocation), and CVRP (service-network coordination) demonstrate that NeRM consistently outperforms state-of-the-art approaches in both performance and efficiency.
End-to-End Optimization for Multimodal Retrieval-Augmented Generation via Reward Backpropagation
Zhiyuan Fan | Longfei Yun | Ming Yan | Yumeng Wang | Dadi Guo | Brian Mak | James Kwok | Yi R. Fung
Findings of the Association for Computational Linguistics: EMNLP 2025
Zhiyuan Fan | Longfei Yun | Ming Yan | Yumeng Wang | Dadi Guo | Brian Mak | James Kwok | Yi R. Fung
Findings of the Association for Computational Linguistics: EMNLP 2025
Multimodal Retrieval-Augmented Generation (MM-RAG) has emerged as a promising approach for enhancing the reliability and factuality of large vision-language models (LVLMs). While end-to-end loss backpropagation is infeasible due to non-differentiable operations during the forward process, current methods primarily focus on component-level optimizations, necessitate extensive component-specific training datasets and suffer from a gap between local and global optimization objectives. In this paper, we propose a new paradigm that backpropagates global rewards from the system output to each component and then transforms these rewards into specific local losses, enabling each component to perform gradient descent and thus ensuring end-to-end optimization. Specifically, we first insert two lightweight multimodal components, a query translator and an adaptive reranker, to address the heterogeneity of multimodal knowledge and the varying knowledge demands for different questions, and then tune only these inserted components using our proposed paradigm to integrate the entire system. Our method achieves SOTA performance on multiple knowledge-intensive multimodal benchmarks with high training efficiency, relying exclusively on supervised signals from an external reward model. Experimental results and our detailed analysis of the evolution of components during training collectively reveal the advantages and considerable potential of this paradigm as a promising direction for MM-RAG research.
Mixture of insighTful Experts (MoTE): The Synergy of Reasoning Chains and Expert Mixtures in Self-Alignment
Zhili Liu | Yunhao Gou | Kai Chen | Lanqing Hong | Jiahui Gao | Fei Mi | Yu Zhang | Zhenguo Li | Xin Jiang | Qun Liu | James Kwok
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhili Liu | Yunhao Gou | Kai Chen | Lanqing Hong | Jiahui Gao | Fei Mi | Yu Zhang | Zhenguo Li | Xin Jiang | Qun Liu | James Kwok
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
As the capabilities of large language models (LLMs) continue to expand, aligning these models with human values remains a significant challenge. Recent studies show that reasoning abilities contribute significantly to model safety, while integrating Mixture-of-Experts (MoE) architectures can further enhance alignment.In this work, we address a fundamental question:How to effectively incorporate reasoning abilitiesand MoE architectures into self-alignment processin LLMs?We propose Mixture of insighTful Experts (MoTE), a novel framework that synergistically combines reasoning chains and expert mixtures to improve self-alignments.From a data perspective, MoTE employs a structured reasoning chain comprising four key stages: Question Analysis, Answer Guidance, Safe Answer, and Safety Checking. This approach enhances safety through multi-step reasoning and proves effective even for smaller and less powerful LLMs (e.g., 7B models). From an architectural perspective, MoTE adopts a multi-LoRA framework with step-level routing, where each expert is dedicated to a specific reasoning step. This design eliminates the need for balance losses, ensures stable training, and supports adaptive inference lengths. Experimental results demonstrate that MoTE significantly improves model safety, jailbreak resistance, and over-refusal capabilities, achieving performance comparable to OpenAI’s state-of-the-art o1 model.
Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning
Yunhao Gou | Hansi Yang | Zhili Liu | Kai Chen | Yihan Zeng | Lanqing Hong | Zhenguo Li | Qun Liu | Bo Han | James Kwok | Yu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yunhao Gou | Hansi Yang | Zhili Liu | Kai Chen | Yihan Zeng | Lanqing Hong | Zhenguo Li | Qun Liu | Bo Han | James Kwok | Yu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Visual Instruction Tuning (VIT) aims to enhance Multimodal Large Language Models (MLLMs), yet its effectiveness is often compromised by corrupted datasets with issues such as hallucinated content, incorrect responses, and poor OCR quality. Previous approaches to address these challenges have focused on refining datasets through high-quality data collection or rule-based filtering that can be costly or limited in scope. In this paper, we conduct a systematic investigation into the impact of corrupted data on MLLMs and discover that, although corrupted data degrade model performance, such adverse effects are largely reversible, and MLLMs are corrupted but not broken. Specifically, we find that disabling a small subset of parameters can almost fully restore performance. Moreover, corrupted MLLMs inherently possess the capability to differentiate between clean and corrupted samples, facilitating dataset cleaning without external intervention. Building on these insights, we introduce a corruption-robust training paradigm that significantly surpasses existing strategies for mitigating the effects of corrupted data.
2024
Forward-Backward Reasoning in Large Language Models for Mathematical Verification
Weisen Jiang | Han Shi | Longhui Yu | Zhengying Liu | Yu Zhang | Zhenguo Li | James Kwok
Findings of the Association for Computational Linguistics: ACL 2024
Weisen Jiang | Han Shi | Longhui Yu | Zhengying Liu | Yu Zhang | Zhenguo Li | James Kwok
Findings of the Association for Computational Linguistics: ACL 2024
Self-Consistency samples diverse reasoning chains with answers and chooses the final answer by majority voting. It is based on forward reasoning and cannot further improve performance by sampling more reasoning chains when saturated. To further boost performance, we introduce backward reasoning to verify candidate answers. Specifically, for mathematical tasks, we mask a number in the question and ask the LLM to answer a backward question created by a simple template, i.e., to predict the masked number when a candidate answer is provided. Instead of using forward or backward reasoning alone, we propose **FOBAR** to combine **FO**rward and **BA**ckward **R**easoning for verification. Extensive experiments on six standard mathematical data sets and three LLMs show that FOBAR achieves state-of-the-art performance. In particular, FOBAR outperforms Self-Consistency, which uses forward reasoning alone, demonstrating that combining forward and backward reasoning is more accurate in verification. In addition, FOBAR achieves higher accuracy than existing verification methods, showing the effectiveness of the simple template used in backward reasoning and the proposed combination.
2023
KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion
Yanbin Wei | Qiushi Huang | Yu Zhang | James Kwok
Findings of the Association for Computational Linguistics: EMNLP 2023
Yanbin Wei | Qiushi Huang | Yu Zhang | James Kwok
Findings of the Association for Computational Linguistics: EMNLP 2023
Knowledge Graph Completion (KGC) is crucial for addressing knowledge graph incompleteness and supporting downstream applications. Many models have been proposed for KGC and they can be categorized into two main classes, including triple-based and test-based approaches. Triple-based methods struggle with long-tail entities due to limited structural information and imbalanced distributions of entities. Text-based methods alleviate this issue but require costly training for language models and specific finetuning for knowledge graphs, which limits their efficiency. To alleviate the limitations in the two approaches, in this paper, we propose KICGPT, a framework that integrates a large language model (LLM) and a triple-based KGC retriever, to alleviate the long-tail problem without incurring additional training overhead. In the proposed KICGPT model, we propose an in-context learning strategy called Knowledge Prompt, which encodes structural knowledge into demonstrations to guide LLM. Empirical results on benchmark datasets demonstrate the effectiveness of the proposed KICGPT model with lighter training overhead and no finetuning.
Search
Fix author
Co-authors
- Yu Zhang 4
- Zhenguo Li 3
- Kai Chen 2
- Yunhao Gou 2
- Lanqing Hong 2
- Zhili Liu 2
- Qun Liu 2
- Yi Cao 1
- Weiyu Chen 1
- Zhiyuan Fan 1
- Yi R. Fung 1
- Jiahui Gao 1
- Shuhan Guo 1
- Dadi Guo 1
- Bo Han 1
- Xuming Hu 1
- Qiushi Huang 1
- Jiahao Huo 1
- Xin Jiang 1
- Weisen Jiang 1
- Shuliang Liu 1
- Zhengying Liu 1
- Brian Mak 1
- Fei Mi 1
- Mingdong Ou 1
- Han Shi 1
- Yumeng Wang 1
- Yanbin Wei 1
- Ming Yan 1
- Yibo Yan 1
- Hansi Yang 1
- Quanming Yao 1
- Nan Yin 1
- Longhui Yu 1
- Longfei Yun 1
- Yihan Zeng 1
- Yuyan Zhou 1
- Xin Zou 1