Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
Hila Gonen, Terra Blevins, Alisa Liu, Luke Zettlemoyer, Noah A. Smith
Abstract
Despite their wide adoption, the biases and unintended behaviors of language models remain poorly understood. In this paper, we identify and characterize a phenomenon never discussed before, which we call semantic leakage, where models leak irrelevant information from the prompt into the generation in unexpected ways. We propose an evaluation setting to detect semantic leakage both by humans and automatically, curate a diverse test suite for diagnosing this behavior, and measure significant semantic leakage in 13 flagship models. We also show that models exhibit semantic leakage in languages besides English and across different settings and generation scenarios. This discovery highlights yet another type of bias in language models that affects their generation patterns and behavior.- Anthology ID:
- 2025.naacl-long.35
- Volume:
- Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
- Month:
- April
- Year:
- 2025
- Address:
- Albuquerque, New Mexico
- Editors:
- Luis Chiruzzo, Alan Ritter, Lu Wang
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 785–798
- Language:
- URL:
- https://preview.aclanthology.org/Author-page-Marten-During-lu/2025.naacl-long.35/
- DOI:
- Cite (ACL):
- Hila Gonen, Terra Blevins, Alisa Liu, Luke Zettlemoyer, and Noah A. Smith. 2025. Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 785–798, Albuquerque, New Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models (Gonen et al., NAACL 2025)
- PDF:
- https://preview.aclanthology.org/Author-page-Marten-During-lu/2025.naacl-long.35.pdf