Dulhan Jayalath


2025

pdf bib
PRISM: Efficient Long-Range Reasoning With Short-Context LLMs
Dulhan Jayalath | James Bradley Wendt | Nicholas Monath | Sandeep Tata | Beliz Gunel
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Long-range tasks demand reasoning over long inputs. However, existing solutions are limited, e.g., long-context models require large compute budgets, parameter-efficient fine-tuning (PEFT) needs training data, and retrieval-augmented generation (RAG) entails complex task-specific designs. Though in-context approaches overcome many of these issues, methods with short-context LLMs are inefficient, trading context for processing more tokens. We introduce **PRISM**, a highly token-efficient in-context method based on structured schemas that outperforms baselines on diverse tasks with **4x shorter contexts**. This approach produces concise outputs and efficiently leverages key-value (KV) caches to **reduce costs by up to 54%**. PRISM scales down to tiny contexts without increasing costs or sacrificing quality, and generalizes to new tasks with minimal effort by generating schemas from task descriptions.