Huy Quang Dao

Also published as: Huy Quang Dao

2025

pdf bib abs
One Planner To Guide Them All ! Learning Adaptive Conversational Planners for Goal-oriented Dialogues
Huy Quang Dao | Lizi Liao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Goal-oriented dialogues, such as recommendation and negotiation, often require balancing multiple, conflicting objectives. Existing methods typically involve training separate models for specific combinations of objectives, leading to computational and scalability issues. In this work, we aim to develop a new dialogue policy method that can adapt to varying objective preferences at inference time without retraining. This raises several challenges in terms of both (1) optimization strategy and (2) knowledge utilization. To address these, we propose a novel learning framework, Preference Adaptive Dialogue Policy Planner (PADPP), for multi-objective goal-oriented dialogues. Specifically, to tackle the former, we introduce a novel policy optimization scheme, which leverages information gained from training the model on previously updated objective weights, accelerating the learning capability on new weight settings. To address the latter, we utilize Generalized Policy Improvement (GPI) to ensure the effectiveness of leveraged knowledge. Experimental results demonstrate that PADPP achieves superior adaptability and performance compared to state-of-the-art approaches, offering a scalable and flexible solution for multi-objective, goal-oriented dialogues. Code and data are available at the anonymous link.

2024

pdf bib abs
Experience as Source for Anticipation and Planning: Experiential Policy Learning for Target-driven Recommendation Dialogues
Huy Quang Dao | Yang Deng | Khanh-Huyen Bui | Dung D. Le | Lizi Liao
Findings of the Association for Computational Linguistics: EMNLP 2024

Target-driven recommendation dialogues present unique challenges in dialogue management due to the necessity of anticipating user interactions for successful conversations. Current methods face significant limitations: (I) inadequate capabilities for conversation anticipation, (II) computational inefficiencies due to costly simulations, and (III) neglect of valuable past dialogue experiences. To address these limitations, we propose a new framework, Experiential Policy Learning (EPL), for enhancing such dialogues. EPL embodies the principle of Learning From Experience, facilitating anticipation with an experiential scoring function that estimates dialogue state potential using similar past interactions stored in long-term memory. To demonstrate its flexibility, we introduce Tree-structured EPL (T-EPL) as one possible training-free realization with Large Language Models (LLMs) and Monte-Carlo Tree Search (MCTS). T-EPL assesses past dialogue states with LLMs while utilizing MCTS to achieve hierarchical and multi-level reasoning. Extensive experiments on two published datasets demonstrate the superiority and efficacy of T-EPL.

2021

pdf bib abs
S-NLP at SemEval-2021 Task 5: An Analysis of Dual Networks for Sequence Tagging
Viet Anh Nguyen | Tam Minh Nguyen | Huy Quang Dao | Quang Huu Pham
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

The SemEval 2021 task 5: Toxic Spans Detection is a task of identifying considered-toxic spans in text, which provides a valuable, automatic tool for moderating online contents. This paper represents the second-place method for the task, an ensemble of two approaches. While one approach relies on combining different embedding methods to extract diverse semantic and syntactic representations of words in context; the other utilizes extra data with a slightly customized Self-training, a semi-supervised learning technique, for sequence tagging problems. Both of our architectures take advantage of a strong language model, which was fine-tuned on a toxic classification task. Although experimental evidence indicates higher effectiveness of the first approach than the second one, combining them leads to our best results of 70.77 F1-score on the test dataset.

Huy Quang Dao

2025

2024

2021

2020

Co-authors

Venues