2025
pdf
bib
abs
TreeReview: A Dynamic Tree of Questions Framework for Deep and Efficient LLM-based Scientific Peer Review
Yuan Chang
|
Ziyue Li
|
Hengyuan Zhang
|
Yuanbo Kong
|
Yanru Wu
|
Hayden Kwok-Hay So
|
Zhijiang Guo
|
Liya Zhu
|
Ngai Wong
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
While Large Language Models (LLMs) have shown significant potential in assisting peer review, current methods often struggle to generate thorough and insightful reviews while maintaining efficiency. In this paper, we propose TreeReview, a novel framework that models paper review as a hierarchical and bidirectional question-answering process. TreeReview first constructs a tree of review questions by recursively decomposing high-level questions into fine-grained sub-questions and then resolves the question tree by iteratively aggregating answers from leaf to root to get the final review. Crucially, we incorporate a dynamic question expansion mechanism to enable deeper probing by generating follow-up questions when needed. We construct a benchmark derived from ICLR and NeurIPS venues to evaluate our method on full review generation and actionable feedback comments generation tasks. Experimental results of both LLM-based and human evaluation show that TreeReview outperforms strong baselines in providing comprehensive, in-depth, and expert-aligned review feedback, while reducing LLM token usage by up to 80% compared to computationally intensive approaches.
pdf
bib
abs
GuiLoMo: Allocating Experts and Ranks for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors
Xinrong Chen
|
Hengyuan Zhang
|
Yingmin Qiu
|
Xiao Liang
|
Ziyue Li
|
Guanyu Wang
|
Weiping Li
|
Tong Mo
|
Hayden Kwok-Hay So
|
Ngai Wong
Findings of the Association for Computational Linguistics: EMNLP 2025
Parameter-efficient fine-tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), offer an efficient way to adapt large language models with reduced computational costs. However, their performance is limited by the small number of trainable parameters. Recent work combines LoRA with the Mixture-of-Experts (MoE), i.e., LoRA-MoE, to enhance capacity, but two limitations remain in hindering the full exploitation of its potential: 1) the influence of downstream tasks when assigning expert numbers, and 2) the uniform rank assignment across all LoRA experts, which restricts representational diversity.To mitigate these gaps, we propose GuiLoMo, a fine-grained layer-wise expert numbers and ranks allocation strategy with GuidedSelection Vectors (GSVs). GSVs are learned via a prior bilevel optimization process to capture both model- and task-specific needs, and are then used to allocate optimal expert numbers and ranks.Experiments on three backbone models across diverse benchmarks show that GuiLoMo consistently achieves superior or comparable performance to all baselines. Further analysis offers key insights into how expert numbers and ranks vary across layers and tasks, highlighting the benefits of adaptive expert configuration. Our code is available at
https://anonymous.4open.science/r/GuiLoMo-034.
pdf
bib
abs
Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges
Jintao Liang
|
Sugang
|
Huifeng Lin
|
You Wu
|
Rui Zhao
|
Ziyue Li
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Retrieval-Augmented Generation (RAG) has emerged as a powerful framework to overcome the knowledge limitations of Large Language Models (LLMs) by integrating external retrieval with language generation. While early RAG systems based on static pipelines have shown effectiveness in well-structured tasks, they struggle in real-world scenarios requiring complex reasoning, dynamic retrieval, and multi-modal integration. To address these challenges, the field has shifted toward Reasoning Agentic RAG, a paradigm that embeds decision-making and adaptive tool use directly into the retrieval process. In this paper, we present a comprehensive review of Reasoning Agentic RAG methods, categorizing them into two primary systems: predefined reasoning, which follow fixed modular pipelines to boost reasoning, and agentic reasoning, where the model autonomously orchestrates tool interaction during inference. We analyze representative techniques under both paradigms, covering architectural design, reasoning strategies, and tool coordination. Finally, we discuss key research challenges and propose future directions to advance the flexibility, robustness, and applicability of reasoning agentic RAG systems.
pdf
bib
Sparser Mixture-of-Adapters with Cross-Layer Generalization
Ziyue Li
|
Tianyi Zhou
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
2024
pdf
bib
abs
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Industry Systems
Yilun Kong
|
Jingqing Ruan
|
YiHong Chen
|
Bin Zhang
|
Tianpeng Bao
|
Shi Shiwei
|
du Guo Qing
|
Xiaoru Hu
|
Hangyu Mao
|
Ziyue Li
|
Xingyu Zeng
|
Rui Zhao
|
Xueqian Wang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Large Language Models (LLMs) have demonstrated proficiency in addressing tasks that necessitate a combination of task planning and the usage of external tools, such as weather and calculator APIs. However, real-world industrial systems present prevalent challenges in task planning and tool usage: numerous APIs in the real system make it intricate to invoke the appropriate one, while the inherent limitations of LLMs pose challenges in orchestrating an accurate sub-task sequence and API-calling order. This paper introduces a comprehensive framework aimed at enhancing the Task Planning and Tool Usage (TPTU) abilities of LLM-based agents in industry. Our framework comprises three key components designed to address these challenges: (1) the API Retriever selects the most pertinent APIs among the extensive API set; (2) the Demo Selector retrieves task-level demonstrations, which is further used for in-context learning to aid LLMs in accurately decomposing subtasks and effectively invoking hard-to-distinguish APIs; (3) LLM Finetuner tunes a base LLM to enhance its capability for task planning and API calling. We validate our methods using a real-world industry system and an open-sourced academic dataset, demonstrating the efficacy of each individual component as well as the integrated framework. The code is available at here.
pdf
bib
abs
Guiding Large Language Models via External Attention Prompting for Scientific Extreme Summarization
Yuan Chang
|
Ziyue Li
|
Xiaoqiu Le
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)
Scientific extreme summarization, the task of generating concise one-sentence summaries (TLDRs) for scientific papers, presents significant challenges due to the need for deep domain-specific understanding and the ability to distill salient information. This study identifies the critical role of titles and keywords in enhancing TLDR generation through quantitative analysis. We propose a novel method, External Attention Prompting (EAP), which leverages LLMs by guiding them to focus on the most critical parts of the source text through varying degrees of attention signals. Our method employs Markdown emphasis syntax to annotate attention levels, enabling LLMs to prioritize salient information effectively. Extensive experiments demonstrate that EAP significantly outperforms baseline methods across various LLMs and metrics in both zero-shot and few-shot settings. Further evaluations by GPT-4 demonstrate that EAP can enable LLMs to generate TLDRs of higher human-aligned quality.
pdf
bib
abs
Simulating Expert Discussions with Multi-agent for Enhanced Scientific Problem Solving
Ziyue Li
|
Yuan Chang
|
Xiaoqiu Le
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)
Large Language Models (LLMs) have shown remarkable potential across various domains, yet their application in addressing complex scientific problems remains a formidable challenge. This paper presents a novel methodology to augment the problem-solving capabilities of LLMs by assigning them roles as domain-specific experts. By simulating a panel of experts, each LLM is tasked with delivering professional and cautious responses to scientific inquiries. Our approach involves querying multiple LLMs and assessing the consistency of their responses. High agreement among the LLMs suggests greater confidence in the proposed solution, whereas discrepancies prompt a collaborative discussion among the LLMs to reach a consensus. This method emulates real-world scientific problem-solving processes, fostering a more reliable and robust mechanism for LLMs to tackle scientific questions. Our experimental results show that assigning roles to multiple LLMs as domain-specific experts significantly improves their accuracy and reliability in solving scientific problems. This framework has the potential to advance the application of AI in scientific research, enhancing its effectiveness and trustworthiness.