Pierre Baldi

2025

pdf bib abs
Memorization: A Close Look at Books
Iris Ma | Ian Domingo | Alberto Krone-Martins | Pierre Baldi | Cristina Lopes
Proceedings of the First Workshop on Large Language Model Memorization (L2M2)

To what extent can entire books be extracted from LLMs? Using the Llama 3 70B family of models, and the “prefix-prompting” extractiontechnique, we were able to auto-regressively reconstruct, with a very high level of similarity, one entire book (Alice’s Adventures in Wonderland) from just the first 500 tokens. We were also able to obtain high extraction rates on several other books, piece-wise. However, these successes do not extend uniformly to all books. We show that extraction rates of books correlate with book popularity and thus, likely duplication in the training data. We also confirm the undoing of mitigations in the instruction-tuned Llama 3.1, following recent work (Nasr et al., 2025). We further find that this undoing comes from changes to only a tiny fraction of weights concentrated primarily in the lower transformer blocks. Our results provide evidence of the limits of current regurgitation mitigation strategies and introduce a framework for studying how fine-tuning affects the retrieval of verbatim memorization in aligned LLMs.

2024

pdf bib abs
Selective Perception: Learning Concise State Descriptions for Language Model Actors
Kolby Nottingham | Yasaman Razeghi | Kyungmin Kim | Jb Lanier | Pierre Baldi | Roy Fox | Sameer Singh
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

The latest large language models (LMs) support increasingly longer contexts. While this trend permits using substantial amounts of text with SOTA LMs, requiring these large LMs to process potentially redundant or irrelevant data needlessly increases inference time and cost. To remedy this problem, we propose BLINDER, a method that leverages a small finetuned LM to sample the minimal set of input features that maximizes the performance of a downstream LM. BLINDER trains an LM with a value head to estimate the likelihood of optimal outputs from a downstream LM given an input. We evaluate BLINDER on embodied decision making tasks with notoriously verbose state descriptions: NetHack and robot planning. BLINDER reduces the length of LM actor input by 87% and 99% while improving task success rates by 158% and 54% on NetHack and robot planning respectively which represents substantial inference cost savings while actually increasing performance.

Co-authors

Iris Ma 1

Venues

Fix author