Wenju Xu


2025

pdf bib
EcomScriptBench: A Multi-task Benchmark for E-commerce Script Planning via Step-wise Intention-Driven Product Association
Weiqi Wang | Limeng Cui | Xin Liu | Sreyashi Nag | Wenju Xu | Chen Luo | Sheikh Muhammad Sarwar | Yang Li | Hansu Gu | Hui Liu | Changlong Yu | Jiaxin Bai | Yifan Gao | Haiyang Zhang | Qi He | Shuiwang Ji | Yangqiu Song
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Goal-oriented script planning, or the ability to devise coherent sequences of actions toward specific goals, is commonly employed by humans to plan for typical activities. In e-commerce, customers increasingly seek LLM-based assistants to generate scripts and recommend products at each step, thereby facilitating convenient and efficient shopping experiences. However, this capability remains underexplored due to several challenges, including the inability of LLMs to simultaneously conduct script planning and product retrieval, difficulties in matching products caused by semantic discrepancies between planned actions and search queries, and a lack of methods and benchmark data for evaluation. In this paper, we step forward by formally defining the task of E-commerce Script Planning (EcomScript) as three sequential subtasks. We propose a novel framework that enables the scalable generation of product-enriched scripts by associating products with each step based on the semantic similarity between the actions and their purchase intentions. By applying our framework to real-world e-commerce data, we construct the very first large-scale EcomScript dataset, EcomScriptBench, which includes 605,229 scripts sourced from 2.4 million products. Human annotations are then conducted to provide gold labels for a sampled subset, forming an evaluation benchmark. Extensive experiments reveal that current (L)LMs face significant challenges with EcomScript tasks, even after fine-tuning, while injecting product purchase intentions improves their performance.

pdf bib
To Answer or Not to Answer (TAONA): A Robust Textual Graph Understanding and Question Answering Approach
Yuchen Yan | Aakash Kolekar | Sahika Genc | Wenju Xu | Edward W Huang | Anirudh Srinivasan | Mukesh Jain | Qi He | Hanghang Tong
Findings of the Association for Computational Linguistics: EMNLP 2025

Recently, textual graph-based retrieval-augmented generation (GraphRAG) has gained popularity for addressing hallucinations in large language models when answering domain-specific questions. Most existing studies assume that generated answers should comprehensively integrate all relevant information from the textual graph. However, this assumption may not always hold when certain information needs to be vetted or even blocked (e.g., due to safety concerns). In this paper, we target two sides of textual graph understanding and question answering: (1) normal question Answering (A-side): following standard practices, this task generates accurate responses using all relevant information within the textual graph; and (2) Blocked question answering (B-side): A new paradigm where the GraphRAG model must effectively infer and exclude specific relevant information in the generated response. To address these dual tasks, we propose TAONA, a novel GraphRAG model with two variants: (1) TAONA-A for A-side task, which incorporates a specialized GraphEncoder to learn graph prompting vectors; and (2) TAONA-B for B-side task, employing semi-supervised node classification to infer potential blocked graph nodes. Extensive experiments validate TAONA’s superior performance for both A-side and B-side tasks.