TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge

Cheng-Han Chiang; Hung-Yi Lee; Michal Lukasik

TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge

Cheng-Han Chiang, Hung-yi Lee, Michal Lukasik

Abstract

The LLM-as-a-judge paradigm uses large language models (LLMs) for automated text evaluation, assigning a score to the input based on scoring rubrics. Existing methods for fine-tuning LLM-as-a-judge use cross-entropy (CE) loss, which neglects the numeric nature of score prediction. Recent work addresses numerical prediction limitations of LLM fine-tuning through regression-aware fine-tuning but does not consider chain-of-thought (CoT) reasoning for score prediction. In this paper, we introduce TRACT (Two-stage Regression-Aware fine-tuning with CoT), which combines CoT reasoning with regression-aware training. TRACT uses a two-stage process: first, it fine-tunes the seed LLM to generate CoTs, which serve as the training data for the second stage; next, it uses these self-generated CoTs to retrain the seed LLM. The fine-tuning objective of TRACT applies CE loss for CoT reasoning and regression-aware loss for the score. Experiments across four LLM-as-a-judge datasets and two LLMs show that TRACT significantly outperforms existing methods. Extensive ablation studies validate the effectiveness of each component in TRACT.

Anthology ID:: 2025.acl-long.147
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2934–2952
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.147/
DOI:
Bibkey:
Cite (ACL):: Cheng-Han Chiang, Hung-yi Lee, and Michal Lukasik. 2025. TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2934–2952, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge (Chiang et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.147.pdf

PDF Cite Search Fix data