Scaling Context, Not Parameters: Training a Compact 7B Language Model for Efficient Long-Context Processing

Chen Wu; Yin Song

Scaling Context, Not Parameters: Training a Compact 7B Language Model for Efficient Long-Context Processing

Abstract

We present MegaBeam-Mistral-7B, a language model that supports 512K-token context length. Our work addresses practical limitations in long-context training, supporting real-world tasks such as compliance monitoring and verification. Evaluated on three long-context benchmarks, our 7B-parameter model demonstrates superior in-context learning performance on HELMET and robust retrieval and tracing capability on RULER. It is currently the only open model to achieve competitive long-range reasoning on BABILong at 512K context length without RAG or targeted fine-tuning. Released as fully open source under the Apache 2.0 license, the model has been downloaded over 100,000 times on Hugging Face.

Anthology ID:: 2025.acl-industry.6
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Georg Rehm, Yunyao Li
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 61–68
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-industry.6/
DOI:
Bibkey:
Cite (ACL):: Chen Wu and Yin Song. 2025. Scaling Context, Not Parameters: Training a Compact 7B Language Model for Efficient Long-Context Processing. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 61–68, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Scaling Context, Not Parameters: Training a Compact 7B Language Model for Efficient Long-Context Processing (Wu & Song, ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-industry.6.pdf

PDF Cite Search Fix data