TTPA: Token-level Tool-use Preference Alignment Training Framework with Fine-grained Evaluation

Chengrui Huang; Shen Gao; Zhengliang Shi; Dongsheng Wang; Shuo Shang

doi:10.18653/v1/2025.findings-emnlp.882

TTPA: Token-level Tool-use Preference Alignment Training Framework with Fine-grained Evaluation

Chengrui Huang, Shen Gao, Zhengliang Shi, Dongsheng Wang, Shuo Shang

Abstract

Existing tool-learning methods usually rely on supervised fine-tuning, they often overlook fine-grained optimization of internal tool call details, leading to limitations in preference alignment and error discrimination. To overcome these challenges, we propose **T**oken-level **T**ool-use **P**reference **A**lignment Training Framework (TTPA), a training paradigm for constructing token-level tool-use preference datasets that align LLMs with fine-grained preferences using a novel error-oriented scoring mechanism. TTPA first introduces reversed dataset construction, a method for creating high-quality, multi-turn tool-use datasets by reversing the generation flow. Additionally, we propose _Preference Oriented Tool-use Dataset Construction_ to capture fine-grained preferences by modeling token-level differences during generation. To address biases in scoring, we introduce the _Error-oriented Scoring Mechanism_, which quantifies tool-call errors and can be used as a training signal. Extensive experiments on three diverse benchmark datasets demonstrate that TTPA significantly improves tool-using performance while showing strong generalization ability across models and datasets.

Anthology ID:: 2025.findings-emnlp.882
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16240–16255
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.882/
DOI:: 10.18653/v1/2025.findings-emnlp.882
Bibkey:
Cite (ACL):: Chengrui Huang, Shen Gao, Zhengliang Shi, Dongsheng Wang, and Shuo Shang. 2025. TTPA: Token-level Tool-use Preference Alignment Training Framework with Fine-grained Evaluation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 16240–16255, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: TTPA: Token-level Tool-use Preference Alignment Training Framework with Fine-grained Evaluation (Huang et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.882.pdf
Checklist:: 2025.findings-emnlp.882.checklist.pdf

PDF Cite Search Checklist Fix data