Mengyu Ye
2025
Can Input Attributions Explain Inductive Reasoning in In-Context Learning?
Mengyu Ye
|
Tatsuki Kuribayashi
|
Goro Kobayashi
|
Jun Suzuki
Findings of the Association for Computational Linguistics: ACL 2025
Interpreting the internal process of neural models has long been a challenge. This challenge remains relevant in the era of large language models (LLMs) and in-context learning (ICL); for example, ICL poses a new issue of interpreting which example in the few-shot examples contributed to identifying/solving the task. To this end, in this paper, we design synthetic diagnostic tasks of inductive reasoning, inspired by the generalization tests in linguistics; here, most in-context examples are ambiguous w.r.t. their underlying rule, and one critical example disambiguates the task demonstrated. The question is whether conventional input attribution (IA) methods can track such a reasoning process, i.e., identify the influential example, in ICL. Our experiments provide several practical findings; for example, a certain simple IA method works the best, and the larger the model, the generally harder it is to interpret the ICL with gradient-based IA methods.
2023
Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study on Syllogism
Mengyu Ye
|
Tatsuki Kuribayashi
|
Jun Suzuki
|
Goro Kobayashi
|
Hiroaki Funayama
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) take advantage of step-by-step reasoning instructions, e.g., chain-of-thought (CoT) prompting. Building on this, their ability to perform CoT-style reasoning robustly is of interest from a probing perspective. In this study, we inspect the step-by-step reasoning ability of LLMs with a focus on negation, which is a core linguistic phenomenon that is difficult to process. In particular, we introduce several controlled settings (e.g., reasoning in case of fictional entities) to evaluate the logical reasoning abilities of the models. We observed that dozens of modern LLMs were not robust against lexical negation (e.g., plausible→implausible) when performing CoT-style reasoning, and the results highlight unique limitations in each LLM family.