Abstract
Effectively training language models on longinputs poses many technical challenges. As acost consideration, languages models are pre-trained on a fixed sequence length before beingadapted to longer sequences. We explore var-ious methods for adapting models to longerinputs by training on segmented sequences andan interpolation-based method for extendingabsolute positional embeddings. We developa training procedure to extend the input con-text size of pretrained models with no architec-tural changes and no additional memory coststhan training on the original input lengths. Bysub-sampling segments from long inputs whilemaintaining their original position the model isable to learn new positional interactions. Ourmethod benefits both models trained with abso-lute positional embeddings, by extending theirinput contexts, as well as popular relative posi-tional embedding methods showing a reducedperplexity on sequences longer than they weretrained on. We demonstrate our method canextend input contexts by a factor of 4× whileimproving perplexity.- Anthology ID:
- 2024.findings-naacl.191
- Volume:
- Findings of the Association for Computational Linguistics: NAACL 2024
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Kevin Duh, Helena Gomez, Steven Bethard
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3040–3052
- Language:
- URL:
- https://aclanthology.org/2024.findings-naacl.191
- DOI:
- Cite (ACL):
- Petros Karypis, Julian McAuley, and George Karypis. 2024. Extending Input Contexts of Language Models through Training on Segmented Sequences. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 3040–3052, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Extending Input Contexts of Language Models through Training on Segmented Sequences (Karypis et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/ingestion-checklist/2024.findings-naacl.191.pdf