Rhombus: Incentivizing Coordination in Parallel Thinking through Reinforcement Learning

Ziyuan Nan, Qi Yi, Di Huang, Yutong Wu, Guanhua Huang, Xue Gong, Kejiao Li, Yuhao Jiang, Chenchen Zhang, Zenan Xu, Xing Hu, Bo Zhou


Abstract
Parallel thinking offers a promising avenue for scaling test-time compute in Large Language Models (LLMs), enabling them to explore diverse solution paths simultaneously before aggregating them into a final answer. However, coordinating the exploration and aggregation stages remains challenging, as simple aggregation techniques often incur information loss, failing to preserve the subtle, decision-relevant signals generated during exploration. To overcome this, we propose Rhombus, a parallel thinking framework that explicitly incentivizes coordination between components via end-to-end reinforcement learning. Rhombus employs multiple parallel Proposers to generate compact, decision-focused reasoning cues and a central Synthesizer to integrate them into final predictions, utilizing co-training under a shared task reward to align their interaction. Across challenging mathematical reasoning benchmarks, Rhombus improves accuracy by 6.0% over long chain-of-thought baselines while reducing wall-clock latency by 39.4% under matched token budgets. Our work demonstrates that explicit communication optimization is essential for realizing the accuracy and efficiency gains of parallel reasoning.
Anthology ID:
2026.findings-acl.1956
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
39258–39270
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1956/
DOI:
Bibkey:
Cite (ACL):
Ziyuan Nan, Qi Yi, Di Huang, Yutong Wu, Guanhua Huang, Xue Gong, Kejiao Li, Yuhao Jiang, Chenchen Zhang, Zenan Xu, Xing Hu, and Bo Zhou. 2026. Rhombus: Incentivizing Coordination in Parallel Thinking through Reinforcement Learning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 39258–39270, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Rhombus: Incentivizing Coordination in Parallel Thinking through Reinforcement Learning (Nan et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1956.pdf
Checklist:
 2026.findings-acl.1956.checklist.pdf