Yu-Chieh Wang
2026
Lexical Familiarity Predicts Processing Depth for Nonliteral Language in Large Language Models
Lang-Ching Yeh | Yu-Chieh Wang | Shu-Kai Hsieh
Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)
Lang-Ching Yeh | Yu-Chieh Wang | Shu-Kai Hsieh
Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)
This paper investigates how large language models internally process nonliteral language. Analyzing five categories spanning slang, metaphor, and idioms across all 48 layers of Gemma-3-12B-IT with Gemma Scope 2 sparse autoencoders, we find a lexical familiarity gradient: processing depth depends on available prior lexical knowledge, not figurative type. Idioms diverge at L1 as entrenched units; expressions built from familiar words (metaphors, semantic-shift and constructional slang) converge at L7–9; neologisms peak at L41, activating 3× more unique features. Paraphrase residual analysis confirms strong signals only at the gradient endpoints, yielding a three-tier hierarchy of entrenched retrieval, known-word reanalysis, and novel-word construction. Crucially, this peak-layer structure replicates in base models (Gemma-PT, Qwen-Base), demonstrating that the gradient is a robust property of pretrained representations rather than an alignment artifact. We additionally identify an activation density confound in SAE feature counts that produces spurious cross-condition convergence. Overall, processing depth is better predicted by lexical familiarity than by figurative type, with implications for robustness to non-standard language and for SAE-based interpretability.