Optimizing Packing and Shuffling Strategies for Enhanced Performance in Generative Language Models

Yanbing Chen; Ruilin Wang; Zihao Yang; Lavender Yao Jiang; Eric Karl Oermann

Optimizing Packing and Shuffling Strategies for Enhanced Performance in Generative Language Models

Yanbing Chen, Ruilin Wang, Zihao Yang, Lavender Yao Jiang, Eric Karl Oermann

Abstract

Packing and shuffling tokens is a common practice in training auto-regressive language models to prevent overfitting and improve efficiency. Documents are typically concatenated to chunks of maximum sequence length (MSL) and shuffled in chunks of tokens (atom-size chunk), possibly breaking context within documents. An alternative approach is padding, which only includes one document per chunk. To optimize both packing strategies (concatenation vs padding), we explored the optimal atom size for shuffling and compared performance and efficiency. We found that in the most common setup (where average document length is greater than MSL), matching atom size to MSL yields the lowest perplexity, controlling for dataset. Also, padding yields lower final perplexity than concatenation at the cost of lower efficiency. This trade-off informs the choice of shuffling and packing methods in training LMs.

Anthology ID:: 2026.acl-srw.124
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Santosh T.Y.S.S., Juan Diego Rodriguez, Ona de Gibert
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1399–1416
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-srw.124/
DOI:
Bibkey:
Cite (ACL):: Yanbing Chen, Ruilin Wang, Zihao Yang, Lavender Yao Jiang, and Eric Karl Oermann. 2026. Optimizing Packing and Shuffling Strategies for Enhanced Performance in Generative Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 1399–1416, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Optimizing Packing and Shuffling Strategies for Enhanced Performance in Generative Language Models (Chen et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-srw.124.pdf

PDF Cite Search Fix data