Ananth Agarwal
2025
Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations
Ananth Agarwal
|
Jasper Jian
|
Christopher D Manning
|
Shikhar Murty
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) exhibit a robust mastery of syntax when processing and generating text. While this suggests internalized understanding of hierarchical syntax and dependency relations, the precise mechanism by which they represent syntactic structure is an open area within interpretability research. Probing provides one way to identify syntactic mechanisms linearly encoded in activations; however, no comprehensive study has yet established whether a model’s probing accuracy reliably predicts its downstream syntactic performance. Adopting a “mechanisms vs. outcomes” framework, we evaluate 32 open-weight transformer models and find that syntactic features extracted via probing fail to predict outcomes of targeted syntax evaluations across English linguistic phenomena. Our results highlight a substantial disconnect between latent syntactic representations found via probing and observable syntactic behaviors in downstream tasks.
2022
Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference
Eric Mitchell
|
Joseph Noh
|
Siyan Li
|
Will Armstrong
|
Ananth Agarwal
|
Patrick Liu
|
Chelsea Finn
|
Christopher Manning
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
While large pre-trained language models are powerful, their predictions often lack logical consistency across test inputs. For example, a state-of-the-art Macaw question-answering (QA) model answers <i>Yes</i> to <i>Is a sparrow a bird?</i> and <i>Does a bird have feet?</i> but answers <i>No</i> to <i>Does a sparrow have feet?</i>. To address this failure mode, we propose a framework, Consistency Correction through Relation Detection, or <b>ConCoRD</b>, for boosting the consistency and accuracy of pre-trained NLP models using pre-trained natural language inference (NLI) models without fine-tuning or re-training. Given a batch of test inputs, ConCoRD samples several candidate outputs for each input and instantiates a factor graph that accounts for both the model’s belief about the likelihood of each answer choice in isolation and the NLI model’s beliefs about pair-wise answer choice compatibility. We show that a weighted MaxSAT solver can efficiently compute high-quality answer choices under this factor graph, improving over the raw model’s predictions. Our experiments demonstrate that ConCoRD consistently boosts accuracy and consistency of off-the-shelf closed-book QA and VQA models using off-the-shelf NLI models, notably increasing accuracy of LXMERT on ConVQA by 5% absolute. See the project website (https://ericmitchell.ai/emnlp-2022-concord/) for code and data.
Search
Fix author
Co-authors
- Christopher D. Manning 2
- Will Armstrong 1
- Chelsea Finn 1
- Jasper Jian 1
- Siyan Li 1
- show all...