Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents

Xubo Lin, Zezhi Deng, Shihao Wang, Grace Hui Yang, Yang Deng


Abstract
Most existing dialogue systems are user-driven, primarily designed to fulfill user requests. However, in many critical real-world scenarios, a conversational agent must proactively extract information to achieve its own objectives rather than merely respond. To address this gap, we introduce Inquisitive Conversational Agents (ICAs) and develop an ICA specifically tailored to U.S. Supreme Court oral arguments. We propose a Dual Hierarchical Reinforcement Learning framework featuring two cooperating RL agents, each with its own policy, to coordinate strategic dialogue management and fine-grained utterance generation. By learning when and how to ask probing questions, the agent emulates judicial questioning patterns and systematically uncovers crucial information to fulfill its legal objectives. Evaluations on a U.S. Supreme Court dataset show our method outperforms single-agent RL baselines in multiple metrics. Although specialized to a single legal domain, it represents an important first step toward broader high-stakes, domain-specific applications. We attached a part of the code as supplementary material. All code will be released upon publication for reproducibility.
Anthology ID:
2026.findings-acl.536
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11030–11047
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.536/
DOI:
Bibkey:
Cite (ACL):
Xubo Lin, Zezhi Deng, Shihao Wang, Grace Hui Yang, and Yang Deng. 2026. Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents. In Findings of the Association for Computational Linguistics: ACL 2026, pages 11030–11047, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents (Lin et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.536.pdf
Checklist:
 2026.findings-acl.536.checklist.pdf