Muhan Gao

2025

pdf bib abs
ICL CIPHERS: Quantifying ”Learning” in In-Context Learning via Substitution Ciphers
Zhouxiang Fang | Aayush Mishra | Muhan Gao | Anqi Liu | Daniel Khashabi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Recent works have suggested that In-Context Learning (ICL) operates in dual modes, i.e. task retrieval (remember learned patterns from pre-training) and task learning (inference-time ”learning” from demonstrations). However, disentangling these the two modes remains a challenging goal. We introduce ICL CIPHERS, a class of task reformulations based on substitution ciphers borrowed from classic cryptography. In this approach, a subset of tokens in the in-context inputs are substituted with other (irrelevant) tokens, rendering English sentences less comprehensible to human eye. However, by design, there is a latent, fixed pattern to this substitution, making it reversible. This bijective (reversible) cipher ensures that the task remains a well-defined task in some abstract sense, despite the transformations. It is a curious question if LLMs can solve tasks reformulated by ICL CIPHERS with a BIJECTIVE mapping, which requires ”deciphering” the latent cipher. We show that LLMs are better at solving tasks reformulated by ICL CIPHERS with BIJECTIVE mappings than the NON-BIJECTIVE (irreversible) baseline, providing a novel approach to quantify ”learning” in ICL. While this gap is small, it is consistent across the board on four datasets and six models. Finally, our interpretability analysis shows evidence that LLMs can internally decode ciphered inputs.

2024

pdf bib abs
Insights into LLM Long-Context Failures: When Transformers Know but Don’t Tell
Muhan Gao | TaiMing Lu | Kuai Yu | Adam Byerly | Daniel Khashabi
Findings of the Association for Computational Linguistics: EMNLP 2024

Large Language Models (LLMs) exhibit positional bias, struggling to utilize information from the middle or end of long contexts. Our study explores LLMs’ long-context reasoning by probing their hidden representations. We find that while LLMs encode the position of target information, they often fail to leverage this in generating accurate responses. This reveals a disconnect between information retrieval and utilization, a “know but don’t tell” phenomenon. We further analyze the relationship between extraction time and final accuracy, offering insights into the underlying mechanics of transformer models.

Co-authors

Aayush Mishra 1

Kuai Yu (喻快) 1

Venues

emnlp1
findings1

Fix author