2025
pdf
bib
abs
Context Length Alone Hurts LLM Performance Despite Perfect Retrieval
Yufeng Du
|
Minyang Tian
|
Srikanth Ronanki
|
Subendhu Rongali
|
Sravan Babu Bodapati
|
Aram Galstyan
|
Azton Wells
|
Roy Schwartz
|
Eliu A Huerta
|
Hao Peng
Findings of the Association for Computational Linguistics: EMNLP 2025
Large language models (LLMs) often fail to scale their performance on long-context tasks performance in line with the context lengths they support. This gap is commonly attributed to retrieval failures—the models’ inability to identify information in the long inputs that is relevant to the task they are solving. Accordingly, recent efforts often focus on evaluating and improving LLMs’ retrieval performance: if retrieval is perfect, a model should, in principle, perform just as well on a long input as it does on a short one—or should it? This paper presents findings that the answer to this question may be negative. Our systematic experiments across 5 open- and closed-source LLMs on math, question answering, and coding tasks reveal that, even when models can perfectly retrieve all relevant information, their performance still degrades substantially (13.9%–85%) as input length increases but remains well within their claimed context lengths. This failure occurs even when the irrelevant tokens are replaced with minimally distracting whitespace, and, more surprisingly, when they are all masked and the models are forced to attend only to the relevant tokens. A similar performance drop is observed when all relevant evidence is placed immediately before the question. Our findings reveal a previously-unrealized limitation: the sheer length of the input alone can hurt LLM performance, independent of retrieval quality and without any distraction. They motivate our simple, model-agnostic mitigation strategy that transforms a long-context task into a short-context one by prompting the model to recite the retrieved evidence before attempting to solve the problem. On RULER, we observe a consistent improvement of GPT-4o up to 4% on an already strong baseline.
2023
pdf
bib
abs
Low-Resource Compositional Semantic Parsing with Concept Pretraining
Subendhu Rongali
|
Mukund Sridhar
|
Haidar Khan
|
Konstantine Arkoudas
|
Wael Hamza
|
Andrew McCallum
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Semantic parsing plays a key role in digital voice assistants such as Alexa, Siri, and Google Assistant by mapping natural language to structured meaning representations. When we want to improve the capabilities of a voice assistant by adding a new domain, the underlying semantic parsing model needs to be retrained using thousands of annotated examples from the new domain, which is time-consuming and expensive. In this work, we present an architecture to perform such domain adaptation automatically, with only a small amount of metadata about the new domain and without any new training data (zero-shot) or with very few examples (few-shot). We use a base seq2seq (sequence-to-sequence) architecture and augment it with a concept encoder that encodes intent and slot tags from the new domain. We also introduce a novel decoder-focused approach to pretrain seq2seq models to be concept aware using Wikidata and use it to help our model learn important concepts and perform well in low-resource settings. We report few-shot and zero-shot results for compositional semantic parsing on the TOPv2 dataset and show that our model outperforms prior approaches in few-shot settings for the TOPv2 and SNIPS datasets.
2021
pdf
bib
abs
Improved Latent Tree Induction with Distant Supervision via Span Constraints
Zhiyang Xu
|
Andrew Drozdov
|
Jay Yoon Lee
|
Tim O’Gorman
|
Subendhu Rongali
|
Dylan Finkbeiner
|
Shilpa Suresh
|
Mohit Iyyer
|
Andrew McCallum
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
For over thirty years, researchers have developed and analyzed methods for latent tree induction as an approach for unsupervised syntactic parsing. Nonetheless, modern systems still do not perform well enough compared to their supervised counterparts to have any practical use as structural annotation of text. In this work, we present a technique that uses distant supervision in the form of span constraints (i.e. phrase bracketing) to improve performance in unsupervised constituency parsing. Using a relatively small number of span constraints we can substantially improve the output from DIORA, an already competitive unsupervised parsing system. Compared with full parse tree annotation, span constraints can be acquired with minimal effort, such as with a lexicon derived from Wikipedia, to find exact text matches. Our experiments show span constraints based on entities improves constituency parsing on English WSJ Penn Treebank by more than 5 F1. Furthermore, our method extends to any domain where span constraints are easily attainable, and as a case study we demonstrate its effectiveness by parsing biomedical text from the CRAFT dataset.
2020
pdf
bib
abs
Unsupervised Parsing with S-DIORA: Single Tree Encoding for Deep Inside-Outside Recursive Autoencoders
Andrew Drozdov
|
Subendhu Rongali
|
Yi-Pei Chen
|
Tim O’Gorman
|
Mohit Iyyer
|
Andrew McCallum
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
The deep inside-outside recursive autoencoder (DIORA; Drozdov et al. 2019) is a self-supervised neural model that learns to induce syntactic tree structures for input sentences *without access to labeled training data*. In this paper, we discover that while DIORA exhaustively encodes all possible binary trees of a sentence with a soft dynamic program, its vector averaging approach is locally greedy and cannot recover from errors when computing the highest scoring parse tree in bottom-up chart parsing. To fix this issue, we introduce S-DIORA, an improved variant of DIORA that encodes a single tree rather than a softly-weighted mixture of trees by employing a hard argmax operation and a beam at each cell in the chart. Our experiments show that through *fine-tuning* a pre-trained DIORA with our new algorithm, we improve the state of the art in *unsupervised* constituency parsing on the English WSJ Penn Treebank by 2.2-6% F1, depending on the data used for fine-tuning.
pdf
bib
abs
Compressing Transformer-Based Semantic Parsing Models using Compositional Code Embeddings
Prafull Prakash
|
Saurabh Kumar Shashidhar
|
Wenlong Zhao
|
Subendhu Rongali
|
Haidar Khan
|
Michael Kayser
Findings of the Association for Computational Linguistics: EMNLP 2020
The current state-of-the-art task-oriented semantic parsing models use BERT or RoBERTa as pretrained encoders; these models have huge memory footprints. This poses a challenge to their deployment for voice assistants such as Amazon Alexa and Google Assistant on edge devices with limited memory budgets. We propose to learn compositional code embeddings to greatly reduce the sizes of BERT-base and RoBERTa-base. We also apply the technique to DistilBERT, ALBERT-base, and ALBERT-large, three already compressed BERT variants which attain similar state-of-the-art performances on semantic parsing with much smaller model sizes. We observe 95.15% 98.46% embedding compression rates and 20.47% 34.22% encoder compression rates, while preserving >97.5% semantic parsing performances. We provide the recipe for training and analyze the trade-off between code embedding sizes and downstream performances.