Elliot Nelson


2025

pdf bib
EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts
Subhajit Chaudhury | Payel Das | Sarathkrishna Swaminathan | Georgios Kollias | Elliot Nelson | Khushbu Pahwa | Tejaswini Pedapati | Igor Melnyk | Matthew Riemer
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advances in Large Language Models (LLMs) have yielded impressive successes on many language tasks. However, efficient processing of long contexts using LLMs remains a significant challenge. We introduce **EpMAN** – a method for processing long contexts in an episodic memory module while holistically attending to semantically-relevant context chunks. Output from episodic attention is then used to reweigh the decoder’s self-attention to the stored KV cache of the context during training and generation. When an LLM decoder is trained using **EpMAN**, its performance on multiple challenging single-hop long-context recall and question-answering benchmarks is found to be stronger and more robust across the range from 16k to 256k tokens than baseline decoders trained with self-attention, and popular retrieval-augmented generation frameworks.