Wei-Chieh Huang
2026
MADIAVE: Multi-Agent Debate for Implicit Attribute Value Extraction
Wei-Chieh Huang | Cornelia Caragea
Findings of the Association for Computational Linguistics: EACL 2026
Wei-Chieh Huang | Cornelia Caragea
Findings of the Association for Computational Linguistics: EACL 2026
Implicit Attribute Value Extraction (AVE) is essential for accurately representing products in e-commerce, as it infers lantent attributes from multimodal data. Despite advances in multimodal large language models (MLLMs), implicit AVE remains challenging due to the complexity of multidimensional data and gaps in vision-text understanding. In this work, we introduce MADIAVE, a multi-agent de- bate framework that employs multiple MLLM agents to iteratively refine inferences. Through a series of debate rounds, agents verify and up- date each other’s responses, thereby improving inference performance and robustness. Experi- ments on the ImplicitAVE dataset demonstrate that even a few rounds of debate significantly boost accuracy, especially for attributes with initially low performance. We systematically evaluate various debate configurations, includ- ing identical or different MLLM agents, and analyze how debate rounds affect convergence dynamics. Our findings highlight the poten- tial of multi-agent debate strategies to address the limitations of single-agent approaches and offer a scalable solution for implicit AVE in multimodal e-commerce.
Deep Research with Open-Domain Evaluation and Multi-Stage Guardrails for Safety
Wei-Chieh Huang | Henry Peng Zou | Yaozu Wu | Dongyuan Li | Yankai Chen | Weizhi Zhang | Yangning Li | Angelo Zangari | Jizhou Guo | Chunyu Miao | Liancheng Fang | Langzhou He | Yinghui Li | Renhe Jiang | Philip S. Yu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Wei-Chieh Huang | Henry Peng Zou | Yaozu Wu | Dongyuan Li | Yankai Chen | Weizhi Zhang | Yangning Li | Angelo Zangari | Jizhou Guo | Chunyu Miao | Liancheng Fang | Langzhou He | Yinghui Li | Renhe Jiang | Philip S. Yu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Deep research frameworks have shown promising capabilities in synthesizing comprehensive reports from web sources. While deep research possesses significant potential to address complex issues through planning and research cycles, existing frameworks are deficient in sufficient evaluation procedures and stage-specific protections. They typically treat evaluation as exact match accuracy of question-answering, but overlook crucial aspects of report quality such as credibility, coherence, breadth, depth, and safety. This oversight may result in hazardous or malicious sources being integrated into the final report. To address this, we introduce DeepResearchGuard, a framework featuring four-stage safeguards with open-domain evaluation, and DRSafeBench, a novel stage-wise safety benchmark. Evaluating across GPT-4o, o4-mini, Gemini-2.5-flash, DeepSeek-v3, and GPT-5, DeepResearchGuard improves defense success rates by an absolute 16.53% while reducing over-refusal rates to approximately 6%. Through extensive experiments, we show that DeepResearchGuard enables comprehensive open-domain evaluation and stage-aware defenses that effectively block harmful content propagation, while systematically improving report quality without excessive over-refusal rates.
2025
Multi-Agent Autonomous Driving Systems with Large Language Models: A Survey of Recent Advances, Resources, and Future Directions
Yaozu Wu | Dongyuan Li | Yankai Chen | Renhe Jiang | Henry Peng Zou | Wei-Chieh Huang | Yangning Li | Liancheng Fang | Zhen Wang | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Yaozu Wu | Dongyuan Li | Yankai Chen | Renhe Jiang | Henry Peng Zou | Wei-Chieh Huang | Yangning Li | Liancheng Fang | Zhen Wang | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Autonomous Driving Systems (ADSs) are revolutionizing transportation by reducing human intervention, improving operational efficiency, and enhancing safety. Large Language Models (LLMs), known for their exceptional planning and reasoning capabilities, have been integrated into ADSs to assist with driving decision-making. However, LLM-based single-agent ADSs face three major challenges: limited perception, insufficient collaboration, and high computational demands. To address these issues, recent advancements in LLM-based multi-agent ADSs have focused on improving inter-agent communication and cooperation. This paper provides a frontier survey of LLM-based multi-agent ADSs. We begin with a background introduction to related concepts, followed by a categorization of existing LLM-based approaches based on different agent interaction modes. We then discuss agent-human interactions in scenarios where LLM-based agents engage with humans. Finally, we summarize key applications, datasets, and challenges in this field to support future research (https://github.com/Yaozuwu/LLM-based_Multi-agent_ADS).
A Survey of RAG-Reasoning Systems in Large Language Models
Yangning Li | Weizhi Zhang | Yuyao Yang | Wei-Chieh Huang | Yaozu Wu | Junyu Luo | Yuanchen Bei | Henry Peng Zou | Xiao Luo | Yusheng Zhao | Chunkit Chan | Yankai Chen | Zhongfen Deng | Yinghui Li | Hai-Tao Zheng | Dongyuan Li | Renhe Jiang | Ming Zhang | Yangqiu Song | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Yangning Li | Weizhi Zhang | Yuyao Yang | Wei-Chieh Huang | Yaozu Wu | Junyu Luo | Yuanchen Bei | Henry Peng Zou | Xiao Luo | Yusheng Zhao | Chunkit Chan | Yankai Chen | Zhongfen Deng | Yinghui Li | Hai-Tao Zheng | Dongyuan Li | Renhe Jiang | Ming Zhang | Yangqiu Song | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge, yet it falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts. This survey synthesizes both strands under a unified reasoning-search perspective. We first map how advanced reasoning optimizes each stage of RAG (Reasoning-Enhanced RAG). Then, we show how retrieved knowledge of different type supply missing premises and expand context for complex inference (RAG-Enhanced Reasoning). Finally, we spotlight emerging Synergized RAG-Reasoning frameworks, where (agentic) LLMs iteratively interleave search and thought to achieve state-of-the-art performance across knowledge-intensive benchmarks. We categorize methods, datasets, and open challenges, and outline research avenues toward deeper RAG-Reasoning systems that are more effective, multimodally-adaptive, trustworthy, and human-centric.
Teaching According to Talents! Instruction Tuning LLMs with Competence-Aware Curriculum Learning
Yangning Li | Tingwei Lu | Yinghui Li | Yankai Chen | Wei-Chieh Huang | Wenhao Jiang | Hui Wang | Hai-Tao Zheng | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Yangning Li | Tingwei Lu | Yinghui Li | Yankai Chen | Wei-Chieh Huang | Wenhao Jiang | Hui Wang | Hai-Tao Zheng | Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Efficient instruction tuning aims to enhance the ultimate performance of large language models (LLMs) trained on a given instruction dataset. Curriculum learning as a typical data organization strategy has shown preliminary effectiveness in instruction tuning. However, current curriculum tuning methods suffer from the curriculum rigidity, since they rely solely on static heuristic difficulty metrics. These methods fail to adapt to the evolving capabilities of models during training, resulting in a fixed and potentially sub-optimal learning trajectory. To address the issue, **C**ompetence-**A**ware **M**ulti-**P**erspective c**U**rriculum in**S**truction tuning framework termed **CAMPUS** is proposed. CAMPUS offers several advantages: (1) Dynamic selection for sub-curriculum. (2) Competency-aware adjustment to the curriculum schedule. (3) Multiple difficulty-based scheduling. Extensive experiments prove the superior performance of CAMPUS, compared to other state-of-the-art baselines for efficient instruction tuning.
Search
Fix author
Co-authors
- Yankai Chen 4
- Yangning Li 4
- Philip S. Yu 4
- Renhe Jiang 3
- Dongyuan Li 3
- Yinghui Li 3
- Yaozu Wu 3
- Henry Peng Zou 3
- Liancheng Fang 2
- Weizhi Zhang 2
- Hai-Tao Zheng 2
- Yuanchen Bei 1
- Cornelia Caragea 1
- Chunkit Chan 1
- Zhongfen Deng 1
- Jizhou Guo 1
- Langzhou He 1
- Wenhao Jiang 1
- Tingwei Lu 1
- Junyu Luo 1
- Xiao Luo 1
- Chunyu Miao 1
- Yangqiu Song 1
- Hui Wang 1
- Zhen Wang 1
- Yuyao Yang 1
- Angelo Zangari 1
- Ming Zhang 1
- Yusheng Zhao 1