ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs

Yuzhuang Xu; Xu Han; Yuxuan Li; Wanxiang Che (车万翔)

ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs

Yuzhuang Xu, Xu Han, Yuxuan Li, Wanxiang Che

Abstract

Although existing frameworks for large language model (LLM) inference on CPUs are mature, they fail to fully exploit the computational potential of many-core CPU platforms. Many-core CPUs are widely deployed in web servers and high-end networking devices, and are typically organized into multiple NUMA nodes that group cores and memory. Current frameworks largely overlook the substantial overhead of cross-NUMA memory access, limiting inference scalability and intelligence enabling on such platforms. To address this limitation, we build ArcLight, a lightweight LLM inference architecture designed from the ground up for many-core CPUs. ArcLight integrates efficient memory management and thread scheduling, and introduces finely controlled tensor parallelism to mitigate the cross-node memory access wall. Experimental results show that ArcLight significantly surpasses the performance ceiling of mainstream frameworks, achieving up to 46% higher inference throughput. Moreover, ArcLight maintains compatibility with arbitrary CPU devices. ArcLight is publicly available at https://github.com/OpenBMB/ArcLight.

Anthology ID:: 2026.acl-demo.18
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Greg Durrett, Ping Jian
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 178–186
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-demo.18/
DOI:
Bibkey:
Cite (ACL):: Yuzhuang Xu, Xu Han, Yuxuan Li, and Wanxiang Che. 2026. ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 178–186, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs (Xu et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-demo.18.pdf

PDF Cite Search Fix data