Karthik Srikumar
2025
Beyond the Haystack: Sensitivity to Context in Legal Reference Recall
Eric Xia
|
Karthik Srikumar
|
Keshav Karthik
|
Advaith Renjith
|
Ashwinee Panda
Proceedings of the Natural Legal Language Processing Workshop 2025
Reference retrieval is critical for many applications in the legal domain, for instance in determining which case texts support a particular claim. However, existing benchmarking methods do not rigorously enable evaluation of recall capabilities in previously unseen contexts. We develop an evaluation framework from U.S. court opinions which ensures models have no prior knowledge of case results or context. Applying our framework, we identify an consistent gap across models and tasks between traditional needle-in-a-haystack retrieval and actual performance in legal recall. Our work shows that standard needle-in-a-haystack benchmarks consistently overestimate recall performance in the legal domain. By isolating the causes of performance degradation to contextual informativity rather than distributional differences, our findings highlight the need for specialized testing in reference-critical applications, and establish an evaluation framework for improving retrieval across informativity levels.
These Aren’t the Vectors You’re Looking For: A Proof of Quantum Advantage in Compositional Generalization
Karthik Srikumar
Proceedings of the QuantumNLP{:} Integrating Quantum Computing with Natural Language Processing
Compositional generalization, the ability to systematically combine known concepts to understand and produce novel expressions, remains a fundamental, unsolved challenge for classical neural language models, whose reliance on statistical correlations in high-dimensional vector spaces inherently limits them. This paper establishes the first rigorous theoretical guarantee of an exponential quantum advantage for compositional generalization. We prove that classical language models, which represent concepts as vectors in ℝd, require a latent dimension scaling linearly with the number of concepts and compositional rules to avoid catastrophic interference. In contrast, we introduce the Quantum Compositional Embedding (QCE) framework, which leverages the intrinsic properties of quantum mechanics. In doing so, we demonstrate that QCE, utilizing only a logarithmic number of qubits, can perfectly represent and generalize compositional structures, a task provably impossible for classical models of equivalent dimensionality. The separation is proven to be exponential, providing a compelling theoretical foundation for quantum natural language processing.