Advancing Process Verification for Large Language Models via Tree-Based Preference Learning

Mingqian He; Yongliang Shen; Wenqi Zhang; Zeqi Tan; Weiming Lu

doi:10.18653/v1/2024.emnlp-main.125

Advancing Process Verification for Large Language Models via Tree-Based Preference Learning

Mingqian He, Yongliang Shen, Wenqi Zhang, Zeqi Tan, Weiming Lu

Abstract

Large Language Models (LLMs) have demonstrated remarkable potential in handling complex reasoning tasks by generating step-by-step rationales. Some methods have proven effective in boosting accuracy by introducing extra verifiers to assess these paths. However, existing verifiers, typically trained on binary-labeled reasoning paths, fail to fully utilize the relative merits of intermediate steps, thereby limiting the effectiveness of the feedback provided. To overcome this limitation, we propose Tree-based Preference Learning Verifier (Tree-PLV), a novel approach that constructs reasoning trees via a best-first search algorithm and collects step-level paired data for preference training. Compared to traditional binary classification, step-level preferences more finely capture the nuances between reasoning steps, allowing for a more precise evaluation of the complete reasoning path. We empirically evaluate Tree-PLV across a range of arithmetic and commonsense reasoning tasks, where it significantly outperforms existing benchmarks. For instance, Tree-PLV achieved substantial performance gains over the Mistral-7B self-consistency baseline on GSM8K (67.55% → 82.79%), MATH (17.00% → 26.80%), CSQA (68.14% → 72.97%), and StrategyQA (82.86% → 83.25%). Additionally, our study explores the appropriate granularity for applying preference learning, revealing that step-level guidance provides feedback that better aligns with the evaluation of the reasoning process.

Anthology ID:: 2024.emnlp-main.125
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2086–2099
Language:
URL:: https://aclanthology.org/2024.emnlp-main.125
DOI:: 10.18653/v1/2024.emnlp-main.125
Bibkey:
Cite (ACL):: Mingqian He, Yongliang Shen, Wenqi Zhang, Zeqi Tan, and Weiming Lu. 2024. Advancing Process Verification for Large Language Models via Tree-Based Preference Learning. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 2086–2099, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Advancing Process Verification for Large Language Models via Tree-Based Preference Learning (He et al., EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/dois-2013-emnlp/2024.emnlp-main.125.pdf
Software:: 2024.emnlp-main.125.software.zip

PDF Search Software