Zihan Li
Papers on this page may belong to the following people: Zihan Li, Zihan Li
2026
DPEPO: Diverse Parallel Exploration Policy Optimization for LLM-based Agents
JunShuo Zhang | Chengrui Huang | Feng Guo | Zihan Li | Ke Shi | Menghua Jiang | Jiguo Yu | Shuo Shang | Shen Gao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
JunShuo Zhang | Chengrui Huang | Feng Guo | Zihan Li | Ke Shi | Menghua Jiang | Jiguo Yu | Shuo Shang | Shen Gao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language model (LLM) agents that follow the sequential “reason-then-act” paradigm have achieved superior performance in many complex tasks. However, these methods suffer from limited exploration and incomplete environmental understanding, as they interact with only a single environment per step. In this paper, we first introduce a novel paradigm that enables an agent to interact with multiple environments simultaneously and share cross-trajectory experiences. Build upon this paradigm, we further propose Diverse Parallel Exploration Policy Optimization (DPEPO), a reinforcement learning (RL) algorithm that encourages the agent to perform diverse parallel exploration. There are two stages in DPEPO: initial supervised fine-tuning (SFT) imparts basic parallel reasoning and action generation, followed by reinforcement learning stage with a hierarchical reward scheme. We design a parallel trajectory-level success reward and two step-level rewards: Diverse Action Reward and Diverse State Transition Reward, which actively penalize behavioral redundancy and promote broad exploration. Extensive experiments on ALFWorld and ScienceWorld show that DPEPO achieves state-of-the-art (SOTA) success rates, while maintaining comparable efficiency to strong sequential baselines.
2023
Chat Disentanglement: Data for New Domains and Methods for More Accurate Annotation
Sai R. Gouravajhala | Andrew M. Vernier | Yiming Shi | Zihan Li | Mark S. Ackerman | Jonathan K. Kummerfeld
Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association
Sai R. Gouravajhala | Andrew M. Vernier | Yiming Shi | Zihan Li | Mark S. Ackerman | Jonathan K. Kummerfeld
Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association
Conversation disentanglement is the task of taking a log of intertwined conversations from a shared channel and breaking the log into individual conversations. The standard datasets for disentanglement are in a single domain and were annotated by linguistics experts with careful training for the task. In this paper, we introduce the first multi-domain dataset and a study of annotation by people without linguistics expertise or extensive training. We experiment with several variations in interfaces, conducting user studies with domain experts and crowd workers. We also test a hypothesis from prior work that link-based annotation is more accurate, finding that it actually has comparable accuracy to set-based annotation. Our new dataset will support the development of more useful systems for this task, and our experimental findings suggest that users are capable of improving the usefulness of these systems by accurately annotating their own data.
2019
CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
Tao Yu | Rui Zhang | Heyang Er | Suyi Li | Eric Xue | Bo Pang | Xi Victoria Lin | Yi Chern Tan | Tianze Shi | Zihan Li | Youxuan Jiang | Michihiro Yasunaga | Sungrok Shim | Tao Chen | Alexander Fabbri | Zifan Li | Luyao Chen | Yuwen Zhang | Shreya Dixit | Vincent Zhang | Caiming Xiong | Richard Socher | Walter Lasecki | Dragomir Radev
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Tao Yu | Rui Zhang | Heyang Er | Suyi Li | Eric Xue | Bo Pang | Xi Victoria Lin | Yi Chern Tan | Tianze Shi | Zihan Li | Youxuan Jiang | Michihiro Yasunaga | Sungrok Shim | Tao Chen | Alexander Fabbri | Zifan Li | Luyao Chen | Yuwen Zhang | Shreya Dixit | Vincent Zhang | Caiming Xiong | Richard Socher | Walter Lasecki | Dragomir Radev
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions. When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow. CoSQL introduces new challenges compared to existing task-oriented dialogue datasets: (1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of domain-specific slot value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains. CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction. We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research. The dataset, baselines, and leaderboard will be released at https://yale-lily.github.io/cosql.
Search
Fix author
Co-authors
- Mark S. Ackerman 1
- Tao Chen 1
- Luyao Chen 1
- Shreya Dixit 1
- Heyang Er 1
- Shen Gao 1
- Sai R. Gouravajhala 1
- Feng Guo 1
- Chengrui Huang 1
- Menghua Jiang 1
- Youxuan Jiang 1
- Jonathan K. Kummerfeld 1
- Walter Lasecki 1
- Suyi Li 1
- Zifan Li 1
- Xi Victoria Lin 1
- Bo Pang 1
- Dragomir Radev 1
- Alexander Richard Fabbri 1
- Shuo Shang 1
- Ke Shi 1
- Tianze Shi 1
- Yiming Shi 1
- Sungrok Shim 1
- Richard Socher 1
- Yi Chern Tan 1
- Andrew M. Vernier 1
- Caiming Xiong 1
- Eric Xue 1
- Michihiro Yasunaga 1
- Jiguo Yu 1
- Tao Yu 1
- JunShuo Zhang 1
- Rui Zhang 1
- Yuwen Zhang 1
- Vincent Zhang 1