FlowHN: Adaptive Token Routing for Efficient Parallel Hybrid Networks
Mohammad Mahdi Moradi, Walid Ahmed, Shuangyue Wen, Sudhir Mudur, Weiwei Zhang, Yang Liu
Abstract
Production LLMs must balance modeling quality with predictable latency, stable accelerator utilization, and cost-efficient scaling—constraints that remain difficult for existing architectures. Transformers provide strong reasoning but incur quadratic complexity, while state-space models (SSMs) scale efficiently yet lack fine-grained interactions; prior hybrids either introduce sequential bottlenecks or rely on learned routing that complicates deployment. We present FlowHN, a deployment-oriented parallel hybrid architecture that enables deterministic conditional computation via FLOP-aware token circulation across attention and SSM branches. Instead of dynamic expert routing, FlowHN performs hardware-aligned token scheduling that balances workloads, reduces synchronization stalls, and preserves full parameter utilization. Across 135M–1B models, FlowHN achieves up to 4× higher throughput and 15% higher MFU than strong Transformer, SSM, and hybrid baselines while maintaining competitive accuracy on reasoning, coding, and long-context tasks up to 32K tokens. FlowHN is designed to integrate directly into existing Hybrid pipelines without changes to optimizers, training stacks, or inference serving infrastructure, making it practical for real-world deployment.- Anthology ID:
- 2026.acl-industry.83
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, USA
- Editors:
- Yunyao Li, Georg Rehm, Mei Tu
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1170–1181
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-industry.83/
- DOI:
- Cite (ACL):
- Mohammad Mahdi Moradi, Walid Ahmed, Shuangyue Wen, Sudhir Mudur, Weiwei Zhang, and Yang Liu. 2026. FlowHN: Adaptive Token Routing for Efficient Parallel Hybrid Networks. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 1170–1181, San Diego, California, USA. Association for Computational Linguistics.
- Cite (Informal):
- FlowHN: Adaptive Token Routing for Efficient Parallel Hybrid Networks (Moradi et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-industry.83.pdf