Shuiwang Ji
2026
ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents
Zhuofeng Li | Yi Lu | Dongfu Jiang | Haoxiang Zhang | Yuyang Bai | Chuan Li | Yu Wang | Shuiwang Ji | Jianwen Xie | Yu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhuofeng Li | Yi Lu | Dongfu Jiang | Haoxiang Zhang | Yuyang Bai | Chuan Li | Yu Wang | Shuiwang Ji | Jianwen Xie | Yu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The rapid rise in AI conference submissions has driven increasing exploration of large language models (LLMs) for peer review support. However, LLM-based reviewers often generate superficial, formulaic comments lacking substantive, evidence-grounded feedback. We attribute this to the underutilization of two key components of human reviewing: explicit rubrics and contextual grounding in existing work. To address this, we introduce ReviewBench, a benchmark evaluating review text according to paper-specific rubrics derived from official guidelines, the paper’s content, and human-written reviews. We further propose ReviewGrounder, a rubric-guided, tool-integrated multi-agent framework that decomposes reviewing into drafting and grounding stages, enriching shallow drafts via targeted evidence consolidation. Experiments on ReviewBench show that ReviewGrounder, using a Phi-4-14B-based drafter and a GPT-OSS-120B-based grounding stage, consistently outperforms baselines with substantially stronger/larger backbones (e.g., GPT-4.1 and DeepSeek-R1-670B) in both alignment with human judgments and rubric-based review quality across 8 dimensions. The code is available at https://github.com/EigenTom/ReviewGrounder.
2025
EcomScriptBench: A Multi-task Benchmark for E-commerce Script Planning via Step-wise Intention-Driven Product Association
Weiqi Wang | Limeng Cui | Xin Liu | Sreyashi Nag | Wenju Xu | Chen Luo | Sheikh Muhammad Sarwar | Yang Li | Hansu Gu | Hui Liu | Changlong Yu | Jiaxin Bai | Yifan Gao | Haiyang Zhang | Qi He | Shuiwang Ji | Yangqiu Song
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Weiqi Wang | Limeng Cui | Xin Liu | Sreyashi Nag | Wenju Xu | Chen Luo | Sheikh Muhammad Sarwar | Yang Li | Hansu Gu | Hui Liu | Changlong Yu | Jiaxin Bai | Yifan Gao | Haiyang Zhang | Qi He | Shuiwang Ji | Yangqiu Song
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Goal-oriented script planning, or the ability to devise coherent sequences of actions toward specific goals, is commonly employed by humans to plan for typical activities. In e-commerce, customers increasingly seek LLM-based assistants to generate scripts and recommend products at each step, thereby facilitating convenient and efficient shopping experiences. However, this capability remains underexplored due to several challenges, including the inability of LLMs to simultaneously conduct script planning and product retrieval, difficulties in matching products caused by semantic discrepancies between planned actions and search queries, and a lack of methods and benchmark data for evaluation. In this paper, we step forward by formally defining the task of E-commerce Script Planning (EcomScript) as three sequential subtasks. We propose a novel framework that enables the scalable generation of product-enriched scripts by associating products with each step based on the semantic similarity between the actions and their purchase intentions. By applying our framework to real-world e-commerce data, we construct the very first large-scale EcomScript dataset, EcomScriptBench, which includes 605,229 scripts sourced from 2.4 million products. Human annotations are then conducted to provide gold labels for a sampled subset, forming an evaluation benchmark. Extensive experiments reveal that current (L)LMs face significant challenges with EcomScript tasks, even after fine-tuning, while injecting product purchase intentions improves their performance.
Reasoning with Graphs: Structuring Implicit Knowledge to Enhance LLMs Reasoning
Haoyu Han | Yaochen Xie | Hui Liu | Xianfeng Tang | Sreyashi Nag | William Headden | Yang Li | Chen Luo | Shuiwang Ji | Qi He | Jiliang Tang
Findings of the Association for Computational Linguistics: ACL 2025
Haoyu Han | Yaochen Xie | Hui Liu | Xianfeng Tang | Sreyashi Nag | William Headden | Yang Li | Chen Luo | Shuiwang Ji | Qi He | Jiliang Tang
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) have demonstrated remarkable success across a wide range of tasks; however, they still encounter challenges in reasoning tasks that require understanding and inferring relationships between distinct pieces of information within text sequences. This challenge is particularly pronounced in tasks involving multi-step processes, such as logical reasoning and multi-hop question answering, where understanding implicit relationships between entities and leveraging multi-hop connections in the given context are crucial. Graphs, as fundamental data structures, explicitly represent pairwise relationships between entities, thereby offering the potential to enhance LLMs’ reasoning capabilities. External graphs have proven effective in supporting LLMs across multiple tasks. However, in many reasoning tasks, no pre-existing graph structure is provided. Can we structure implicit knowledge derived from context into graphs to assist LLMs in reasoning? In this paper, we propose Reasoning with Graphs (RwG) by first constructing explicit graphs from the context and then leveraging these graphs to enhance LLM reasoning performance on reasoning tasks. Extensive experiments demonstrate the effectiveness of the proposed method in improving both logical reasoning and multi-hop question answering tasks.
2024
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery
Yu Zhang | Xiusi Chen | Bowen Jin | Sheng Wang | Shuiwang Ji | Wei Wang | Jiawei Han
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Yu Zhang | Xiusi Chen | Bowen Jin | Sheng Wang | Shuiwang Ji | Wei Wang | Jiawei Han
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
In many scientific fields, large language models (LLMs) have revolutionized the way text and other modalities of data (e.g., molecules and proteins) are handled, achieving superior performance in various applications and augmenting the scientific discovery process. Nevertheless, previous surveys on scientific LLMs often concentrate on one or two fields or a single modality. In this paper, we aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs regarding their architectures and pre-training techniques. To this end, we comprehensively survey over 260 scientific LLMs, discuss their commonalities and differences, as well as summarize pre-training datasets and evaluation tasks for each field and modality. Moreover, we investigate how LLMs have been deployed to benefit scientific discovery. Resources related to this survey are available at https://github.com/yuzhimanhua/Awesome-Scientific-Language-Models.
Search
Fix author
Co-authors
- Qi He 2
- Yang Li 2
- Hui Liu 2
- Chen Luo 2
- Sreyashi Nag 2
- Yu Zhang 2
- Jiaxin Bai 1
- Yuyang Bai 1
- Xiusi Chen 1
- Limeng Cui 1
- Yifan Gao 1
- Hansu Gu 1
- Haoyu Han 1
- Jiawei Han 1
- William Headden 1
- Dongfu Jiang 1
- Bowen Jin 1
- Zhuofeng Li 1
- Chuan Li 1
- Xin Liu 1
- Yi Lu 1
- Sheikh Muhammad Sarwar 1
- Yangqiu Song 1
- Xianfeng Tang 1
- Jiliang Tang 1
- Weiqi Wang 1
- Sheng Wang 1
- Wei Wang 1
- Yu Wang 1
- Yaochen Xie 1
- Jianwen Xie 1
- Wenju Xu 1
- Changlong Yu 1
- Haiyang Zhang 1
- Haoxiang Zhang 1