ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Tao Liu (刘涛); Taiqiang Wu; Runming Yang; Shaoning Sun; Junjie Wang; Yujiu Yang

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Tao Liu, Taiqiang Wu, Runming Yang, Shaoning Sun, Junjie Wang, Yujiu Yang

Abstract

Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer, leading to the model overfitting to non-core expressions. Although our empirical analysis suggests that introducing multiple reference answers can mitigate this issue, the prohibitive data and computational costs necessitate a strategic shift: prioritizing the mitigation of single-reference overfitting over the costly pursuit of answer diversity. To achieve this, we reveal the intrinsic connection between token probability and semantic importance: high-probability tokens carry the core logical framework, while low-probability tokens are mostly replaceable expressions. Based on this insight, we propose ProFit, which selectively masks low-probability tokens to prevent surface-level overfitting. Extensive experiments confirm that ProFit consistently outperforms traditional SFT baselines on general reasoning and mathematical benchmarks

Anthology ID:: 2026.findings-acl.755
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15383–15401
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.755/
DOI:
Bibkey:
Cite (ACL):: Tao Liu, Taiqiang Wu, Runming Yang, Shaoning Sun, Junjie Wang, and Yujiu Yang. 2026. ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection. In Findings of the Association for Computational Linguistics: ACL 2026, pages 15383–15401, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection (Liu et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.755.pdf
Checklist:: 2026.findings-acl.755.checklist.pdf

PDF Cite Search Checklist Fix data