Emir Kaan Korukluoglu


2025

pdf bib
OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature
Alisha Srivastava | Emir Kaan Korukluoglu | Minh Nhat Le | Duyen Tran | Chau Minh Pham | Marzena Karpinska | Mohit Iyyer
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) are known to memorize and recall English text from their pretraining data. However, the extent to which this ability generalizes to non-English languages or transfers across languages remains unclear. This paper investigates multilingual and cross-lingual memorization in LLMs, probing if memorized content in one language (e.g., English) can be recalled when presented in translation. To do so, we introduce , a dataset of **31.5K** aligned excerpts from 20 books in ten languages, including English originals, official translations (Vietnamese, Spanish, Turkish), and new translations in six low-resource languages (Sesotho, Yoruba, Maithili, Malagasy, Setswana, Tahitian). We evaluate memorization across model families and sizes through three tasks: (1) **direct probing**, which asks the model to identify a book’s title and author; (2) **name cloze**, which requires predicting masked character names; and (3) **prefix probing**, which involves generating continuations. We find that some LLMs consistently recall content across languages, even for texts without existing translation. GPT-4o, for example, identifies authors and titles 69.4% of the time and masked entities 6.3% of the time in newly translated excerpts. While perturbations (e.g., masking characters, shuffling words) reduce accuracy, the model’s performance remains above chance level. Our results highlight the extent of cross-lingual memorization and provide insights on the differences between the models.