Fengge Wu

2026

CondenseFlow: Scalable Latent Space Collaboration via Semantic Compression for Multi-Agent Systems
Xiaoyu Chen | Fengge Wu | Zhao Junsuo | Yun Fan
Findings of the Association for Computational Linguistics: ACL 2026

Full-state latent communication in LLM-based multi-agent systems offers richer semantics than text but suffers from memory overhead scaling linearly with collaboration rounds. We propose CondenseFlow, which introduces the Latent Thought Condenser (LTC)—a lightweight module using learnable semantic probes to compress KV caches into fixed-size representations, achieving 𝒪(1) communication complexity regardless of context length. We theoretically prove that compression error is bounded by attention concentration and accumulates controllably across rounds. On seven benchmarks spanning six models, CondenseFlow reduces KV cache memory by over 99% and inference latency by approximately 20% compared to dense transfer with negligible accuracy degradation, while outperforming text-based methods by 1.7 percentage points on average across all configurations. Code is available at https://github.com/xxy33/condenseflow.

Co-authors

Venues

Findings1

Fix author