The Best of Both Worlds: Combining Parallel and Sequential Inference Scaling via Aggregation Fine-Tuning

Yafu Li; Zhilin Wang; Tingchen Fu; Ganqu Cui; Sen Yang; Yu Cheng

The Best of Both Worlds: Combining Parallel and Sequential Inference Scaling via Aggregation Fine-Tuning

Yafu Li, Zhilin Wang, Tingchen Fu, Ganqu Cui, Sen Yang, Yu Cheng

Abstract

Scaling data and model size has been proven effective for boosting the performance of large language models. In addition to training-time scaling, recent studies have revealed that increasing test-time computational resources can further improve performance. In this work, we introduce Aggregation Fine-Tuning (AFT), a supervised fine-tuning paradigm where the model learns to synthesize multiple draft responses, referred to as proposals, into a single, refined answer, termed aggregation. At inference time, we apply a propose-and-aggregate strategy that iteratively generates and aggregates proposals, effectively scaling inference-time computation without relying on external guidance such as a reward model. Empirical results across benchmark datasets demonstrate that AFT-trained models achieve substantial gains with test-time scaling, outperforming best-of-N baselines while eliminating the need for external reward signals. Notably, an AFT model, fine-tuned from Llama3.1-8B-Base with only 64k data, achieves a 41.3% LC win rate on AlpacaEval 2, surpassing significantly larger LLMs such as Llama3.1-405B-Instruct and GPT-4. By combining sequential refinement and parallel sampling, the propose-and-aggregate framework scales inference-time computation in a flexible manner.

Anthology ID:: 2026.findings-acl.1568
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 31369–31389
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1568/
DOI:
Bibkey:
Cite (ACL):: Yafu Li, Zhilin Wang, Tingchen Fu, Ganqu Cui, Sen Yang, and Yu Cheng. 2026. The Best of Both Worlds: Combining Parallel and Sequential Inference Scaling via Aggregation Fine-Tuning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 31369–31389, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: The Best of Both Worlds: Combining Parallel and Sequential Inference Scaling via Aggregation Fine-Tuning (Li et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1568.pdf
Checklist:: 2026.findings-acl.1568.checklist.pdf

PDF Cite Search Checklist Fix data