DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models

Sunghee Jung; Donghun Lee; Shinbok Lee; Gaeun Seo; Daniel Lee; Byeongil Ko; Junrae Cho; Kihyun Kim; EungGyun Kim; Myeongcheol Shin

DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models

Sunghee Jung, Donghun Lee, Shinbok Lee, Gaeun Seo, Daniel Lee, Byeongil Ko, Junrae Cho, Kihyun Kim, EungGyun Kim, Myeongcheol Shin

Abstract

Tool-Augmented Larage Language Models (TA-LLMs) have shown promise in real-world applications, but face challenges in handling incomplete queries and out-of-scope requests. While existing approaches rely mainly on Supervised Fine-Tuning with expert trajectories, we propose DiaTool-DPO, a novel method that enhances TA-LLM’s dialogue capabilities through Direct Preference Optimization. We model TA-LLM interactions as a Markov Decision Process with 5 distinct dialogue states and categorize user queries into 3 types based on their state transition trajectories. We automatically construct paired trajectory datasets of correct and incorrect dialogue flows and introduce a specialized objective loss for dialogue control. Our comprehensive evaluation demonstrates that DiaTool-DPO approaches GPT-4o’s performance (94.8% in information gathering, 91% in tool call rejection) with substantial improvements over baseline (44% and 9.6% respectively) while maintaining core functionality. Our approach opens new possibilities for developing TA-LLMs that can handle diverse real-world scenarios without requiring additional expert demonstrations or human labeling.

Anthology ID:: 2025.sigdial-1.32
Volume:: Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:: August
Year:: 2025
Address:: Avignon, France
Editors:: Frédéric Béchet, Fabrice Lefèvre, Nicholas Asher, Seokhwan Kim, Teva Merlin
Venue:: SIGDIAL
SIG:: SIGDIAL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 397–416
Language:
URL:: https://preview.aclanthology.org/corrections-2025-10/2025.sigdial-1.32/
DOI:
Bibkey:
Cite (ACL):: Sunghee Jung, Donghun Lee, Shinbok Lee, Gaeun Seo, Daniel Lee, Byeongil Ko, Junrae Cho, Kihyun Kim, EungGyun Kim, and Myeongcheol Shin. 2025. DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models. In Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 397–416, Avignon, France. Association for Computational Linguistics.
Cite (Informal):: DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models (Jung et al., SIGDIAL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2025-10/2025.sigdial-1.32.pdf

PDF Cite Search Fix data