Extending Input Contexts of Language Models through Training on Segmented Sequences

Petros Karypis; Julian McAuley; George Karypis

doi:10.18653/v1/2024.findings-naacl.191

Extending Input Contexts of Language Models through Training on Segmented Sequences

Petros Karypis, Julian McAuley, George Karypis

Abstract

Effectively training language models on longinputs poses many technical challenges. As acost consideration, languages models are pre-trained on a fixed sequence length before beingadapted to longer sequences. We explore var-ious methods for adapting models to longerinputs by training on segmented sequences andan interpolation-based method for extendingabsolute positional embeddings. We developa training procedure to extend the input con-text size of pretrained models with no architec-tural changes and no additional memory coststhan training on the original input lengths. Bysub-sampling segments from long inputs whilemaintaining their original position the model isable to learn new positional interactions. Ourmethod benefits both models trained with abso-lute positional embeddings, by extending theirinput contexts, as well as popular relative posi-tional embedding methods showing a reducedperplexity on sequences longer than they weretrained on. We demonstrate our method canextend input contexts by a factor of 4× whileimproving perplexity.

Anthology ID:: 2024.findings-naacl.191
Volume:: Findings of the Association for Computational Linguistics: NAACL 2024
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3040–3052
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-naacl.191/
DOI:: 10.18653/v1/2024.findings-naacl.191
Bibkey:
Cite (ACL):: Petros Karypis, Julian McAuley, and George Karypis. 2024. Extending Input Contexts of Language Models through Training on Segmented Sequences. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 3040–3052, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Extending Input Contexts of Language Models through Training on Segmented Sequences (Karypis et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-naacl.191.pdf
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-naacl.191.mp4

PDF Cite Search Video Fix data