The Best of Both Worlds: Combining Parallel and Sequential Inference Scaling via Aggregation Fine-Tuning
Yafu Li, Zhilin Wang, Tingchen Fu, Ganqu Cui, Sen Yang, Yu Cheng
Abstract
Scaling data and model size has been proven effective for boosting the performance of large language models. In addition to training-time scaling, recent studies have revealed that increasing test-time computational resources can further improve performance. In this work, we introduce Aggregation Fine-Tuning (AFT), a supervised fine-tuning paradigm where the model learns to synthesize multiple draft responses, referred to as proposals, into a single, refined answer, termed aggregation. At inference time, we apply a propose-and-aggregate strategy that iteratively generates and aggregates proposals, effectively scaling inference-time computation without relying on external guidance such as a reward model. Empirical results across benchmark datasets demonstrate that AFT-trained models achieve substantial gains with test-time scaling, outperforming best-of-N baselines while eliminating the need for external reward signals. Notably, an AFT model, fine-tuned from Llama3.1-8B-Base with only 64k data, achieves a 41.3% LC win rate on AlpacaEval 2, surpassing significantly larger LLMs such as Llama3.1-405B-Instruct and GPT-4. By combining sequential refinement and parallel sampling, the propose-and-aggregate framework scales inference-time computation in a flexible manner.- Anthology ID:
- 2026.findings-acl.1568
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 31369–31389
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1568/
- DOI:
- Cite (ACL):
- Yafu Li, Zhilin Wang, Tingchen Fu, Ganqu Cui, Sen Yang, and Yu Cheng. 2026. The Best of Both Worlds: Combining Parallel and Sequential Inference Scaling via Aggregation Fine-Tuning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 31369–31389, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- The Best of Both Worlds: Combining Parallel and Sequential Inference Scaling via Aggregation Fine-Tuning (Li et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1568.pdf