Liancheng Fang
2026
d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models
Leyi Pan | Shuchang Tao | Yunpeng Zhai | Zheyu Fu | Liancheng Fang | Minghua He | Lingzhe Zhang | Zhaoyang Liu | Bolin Ding | Aiwei Liu | Lijie Wen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Leyi Pan | Shuchang Tao | Yunpeng Zhai | Zheyu Fu | Liancheng Fang | Minghua He | Lingzhe Zhang | Zhaoyang Liu | Bolin Ding | Aiwei Liu | Lijie Wen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement learning (RL) is pivotal for enhancing the reasoning capabilities of diffusion large language models (dLLMs). However, existing dLLM policy optimization methods suffer from two critical reliability bottlenecks: (1) reward sparsity, arising from coarse or unverifiable signals that impede accurate advantage calculation; and (2) their probability estimates do not account for the gap to the unbiased expectation over all decoding orders, which are intractable to compute. To mitigate these issues, we propose d-TreeRPO, a reliable RL framework for dLLMs that leverages tree-structured rollouts and bottom-up advantage computation based on verifiable outcome rewards to provide fine-grained and verifiable step-wise reward signals. Furthermore, we provide a theoretical proof demonstrating that increasing prediction confidence effectively minimizes the gap between unbiased expected prediction probabilities and its single-step forward pass estimate. Guided by this analysis, we introduce a time-scheduled self-distillation loss during training that enhances prediction confidence in later training stages, thereby enabling more accurate probability estimation and better performance. Experiments demonstrate that d-TreeRPO outperforms existing baselines and achieves significant improvements across multiple reasoning benchmarks. Specifically, it achieves +86.2% on Sudoku, +51.6% on Countdown, +4.5% on GSM8K, and +5.3% on Math500 compared to the base model.
Deep Research with Open-Domain Evaluation and Multi-Stage Guardrails for Safety
Wei-Chieh Huang | Henry Peng Zou | Yaozu Wu | Dongyuan Li | Yankai Chen | Weizhi Zhang | Yangning Li | Angelo Zangari | Jizhou Guo | Chunyu Miao | Liancheng Fang | Langzhou He | Yinghui Li | Renhe Jiang | Philip S. Yu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Wei-Chieh Huang | Henry Peng Zou | Yaozu Wu | Dongyuan Li | Yankai Chen | Weizhi Zhang | Yangning Li | Angelo Zangari | Jizhou Guo | Chunyu Miao | Liancheng Fang | Langzhou He | Yinghui Li | Renhe Jiang | Philip S. Yu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Deep research frameworks have shown promising capabilities in synthesizing comprehensive reports from web sources. While deep research possesses significant potential to address complex issues through planning and research cycles, existing frameworks are deficient in sufficient evaluation procedures and stage-specific protections. They typically treat evaluation as exact match accuracy of question-answering, but overlook crucial aspects of report quality such as credibility, coherence, breadth, depth, and safety. This oversight may result in hazardous or malicious sources being integrated into the final report. To address this, we introduce DeepResearchGuard, a framework featuring four-stage safeguards with open-domain evaluation, and DRSafeBench, a novel stage-wise safety benchmark. Evaluating across GPT-4o, o4-mini, Gemini-2.5-flash, DeepSeek-v3, and GPT-5, DeepResearchGuard improves defense success rates by an absolute 16.53% while reducing over-refusal rates to approximately 6%. Through extensive experiments, we show that DeepResearchGuard enables comprehensive open-domain evaluation and stage-aware defenses that effectively block harmful content propagation, while systematically improving report quality without excessive over-refusal rates.
2025
Multi-Agent Autonomous Driving Systems with Large Language Models: A Survey of Recent Advances, Resources, and Future Directions
Yaozu Wu | Dongyuan Li | Yankai Chen | Renhe Jiang | Henry Peng Zou | Wei-Chieh Huang | Yangning Li | Liancheng Fang | Zhen Wang | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Yaozu Wu | Dongyuan Li | Yankai Chen | Renhe Jiang | Henry Peng Zou | Wei-Chieh Huang | Yangning Li | Liancheng Fang | Zhen Wang | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Autonomous Driving Systems (ADSs) are revolutionizing transportation by reducing human intervention, improving operational efficiency, and enhancing safety. Large Language Models (LLMs), known for their exceptional planning and reasoning capabilities, have been integrated into ADSs to assist with driving decision-making. However, LLM-based single-agent ADSs face three major challenges: limited perception, insufficient collaboration, and high computational demands. To address these issues, recent advancements in LLM-based multi-agent ADSs have focused on improving inter-agent communication and cooperation. This paper provides a frontier survey of LLM-based multi-agent ADSs. We begin with a background introduction to related concepts, followed by a categorization of existing LLM-based approaches based on different agent interaction modes. We then discuss agent-human interactions in scenarios where LLM-based agents engage with humans. Finally, we summarize key applications, datasets, and challenges in this field to support future research (https://github.com/Yaozuwu/LLM-based_Multi-agent_ADS).
TABGEN-ICL: Residual-Aware In-Context Example Selection for Tabular Data Generation
Liancheng Fang | Aiwei Liu | Hengrui Zhang | Henry Peng Zou | Weizhi Zhang | Philip S. Yu
Findings of the Association for Computational Linguistics: ACL 2025
Liancheng Fang | Aiwei Liu | Hengrui Zhang | Henry Peng Zou | Weizhi Zhang | Philip S. Yu
Findings of the Association for Computational Linguistics: ACL 2025
Large Language models (LLMs) have achieved encouraging results in tabular data generation. However, existing approaches require fine-tuning, which is computationally expensive. This paper explores an alternative: prompting a fixed LLM with in-context examples. We observe that using randomly selected in-context examples hampers the LLM’s performance, resulting in sub-optimal generation quality. To address this, we propose a novel in-context learning framework: TabGen-ICL, to enhance the in-context learning ability of LLMs for tabular data generation. TabGen-ICL operates iteratively, retrieving a subset of real samples that represent the residual between currently generated samples and true data distributions. This approach serves two purposes: locally, it provides more effective in-context learning examples for the LLM in each iteration; globally, it progressively narrows the gap between generated and real data. Extensive experiments on five real-world tabular datasets demonstrate that TabGen-ICL significantly outperforms the random selection strategy. Specifically, it reduces the error rate by a margin of up to 42.2% on the fidelity metric. We demonstrate for the first time that prompting a fixed LLM can yield high-quality synthetic tabular data.
TestNUC: Enhancing Test-Time Computing Approaches and Scaling through Neighboring Unlabeled Data Consistency
Henry Peng Zou | Zhengyao Gu | Yue Zhou | Yankai Chen | Weizhi Zhang | Liancheng Fang | Yibo Wang | Yangning Li | Kay Liu | Philip S. Yu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Henry Peng Zou | Zhengyao Gu | Yue Zhou | Yankai Chen | Weizhi Zhang | Liancheng Fang | Yibo Wang | Yangning Li | Kay Liu | Philip S. Yu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Test-time computing approaches, which leverage additional computational resources during inference, have been proven effective in enhancing large language model performance. This work introduces a novel, linearly scaling approach, TestNUC, that improves test-time predictions by leveraging the local consistency of neighboring unlabeled data-it classifies an input instance by considering not only the model’s prediction on that instance but also on neighboring unlabeled instances. We evaluate TestNUC across eight diverse datasets, spanning intent classification, topic mining, domain discovery, and emotion detection, demonstrating its consistent superiority over baseline methods such as standard prompting and self-consistency. Furthermore, TestNUC can be seamlessly integrated with existing test-time computing approaches, substantially boosting their performance. Our analysis reveals that TestNUC scales effectively with increasing amounts of unlabeled data and performs robustly across different embedding models, making it practical for real-world applications. Our code is available at https://github.com/HenryPengZou/TestNUC.
2024
ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction
Henry Peng Zou | Vinay Samuel | Yue Zhou | Weizhi Zhang | Liancheng Fang | Zihe Song | Philip S. Yu | Cornelia Caragea
Findings of the Association for Computational Linguistics: ACL 2024
Henry Peng Zou | Vinay Samuel | Yue Zhou | Weizhi Zhang | Liancheng Fang | Zihe Song | Philip S. Yu | Cornelia Caragea
Findings of the Association for Computational Linguistics: ACL 2024
Existing datasets for attribute value extraction (AVE) predominantly focus on explicit attribute values while neglecting the implicit ones, lack product images, are often not publicly available, and lack an in-depth human inspection across diverse domains. To address these limitations, we present ImplicitAVE, the first, publicly available multimodal dataset for implicit attribute value extraction. ImplicitAVE, sourced from the MAVE dataset, is carefully curated and expanded to include implicit AVE and multimodality, resulting in a refined dataset of 68k training and 1.6k testing data across five domains. We also explore the application of multimodal large language models (MLLMs) to implicit AVE, establishing a comprehensive benchmark for MLLMs on the ImplicitAVE dataset. Six recent MLLMs with eleven variants are evaluated across diverse settings, revealing that implicit value extraction remains a challenging task for MLLMs. The contributions of this work include the development and release of ImplicitAVE, and the exploration and benchmarking of various MLLMs for implicit AVE, providing valuable insights and potential future research directions. Dataset and code are available at https://github.com/HenryPengZou/ImplicitAVE.
Search
Fix author
Co-authors
- Philip S. Yu 5
- Henry Peng Zou 5
- Weizhi Zhang 4
- Yankai Chen 3
- Yangning Li 3
- Wei-Chieh Huang 2
- Renhe Jiang 2
- Dongyuan Li 2
- Aiwei Liu 2
- Yaozu Wu 2
- Yue Zhou 2
- Cornelia Caragea 1
- Bolin Ding 1
- Zheyu Fu 1
- Zhengyao Gu 1
- Jizhou Guo 1
- Langzhou He 1
- Minghua He 1
- Yinghui Li 1
- Kay Liu 1
- Zhaoyang Liu 1
- Chunyu Miao 1
- Leyi Pan 1
- Vinay Samuel 1
- Zihe Song 1
- Shuchang Tao 1
- Yibo Wang 1
- Zhen Wang 1
- Lijie Wen 1
- Angelo Zangari 1
- Yunpeng Zhai 1
- Hengrui Zhang 1
- Lingzhe Zhang 1