Structured Pruning for Diverse Best-of-N Reasoning Optimization

Hieu Trung Nguyen, Bao Nguyen, Viet Anh Nguyen


Abstract
Model pruning in transformer-based language models, traditionally seen as a means of computational savings, can enhance the model’s reasoning capabilities. In this work, we uncover the surprising phenomenon that the selective pruning of certain attention heads leads to improvements in reasoning performance, particularly on challenging tasks. Motivated by this observation, we propose SPRINT, a novel contrastive learning framework that dynamically selects the optimal head and layer to prune during inference. By aligning question embeddings with head embeddings, our approach identifies those pruned-head configurations that result in more accurate reasoning. Extensive experiments on the MATH dataset demonstrate that our method significantly outperforms traditional best-of-N and random head selection strategies on the MATH500 and GSM8K datasets.
Anthology ID:
2025.findings-acl.1225
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23911–23922
Language:
URL:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.1225/
DOI:
Bibkey:
Cite (ACL):
Hieu Trung Nguyen, Bao Nguyen, and Viet Anh Nguyen. 2025. Structured Pruning for Diverse Best-of-N Reasoning Optimization. In Findings of the Association for Computational Linguistics: ACL 2025, pages 23911–23922, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Structured Pruning for Diverse Best-of-N Reasoning Optimization (Nguyen et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.1225.pdf