Fatih Turkmen
2025
Memorization is Language-Sensitive: Analyzing Memorization and Inference Risks of LLMs in a Multilingual Setting
Ali Satvaty
|
Anna Visman
|
Dan Seidel
|
Suzan Verberne
|
Fatih Turkmen
Proceedings of the First Workshop on Large Language Model Memorization (L2M2)
Large Language Models (LLMs) are known to memorize and reproduce parts of their training data during inference, raising significant privacy and safety concerns. While this phenomenon has been extensively studied to explain its contributing factors and countermeasures, its implications in multilingual contexts remain largely unexplored.In this work, we investigate cross-lingual differences in memorization behaviors of multilingual LLMs.Specifically, we examine both discoverable memorization and susceptibility to perplexity ratio attacks using Pythia models of varying sizes, evaluated on two parallel multilingual datasets.Our results reveal that lower-resource languages consistently exhibit higher vulnerability to perplexity ratio attacks, indicating greater privacy risks. In contrast, patterns of discoverable memorization appear to be influenced more strongly by the model’s pretraining or fine-tuning phases than by language resource level alone.These findings highlight the nuanced interplay between language resource availability and memorization in multilingual LLMs, providing insights toward developing safer and more privacy-preserving language models across diverse linguistic settings.
2021
Using Confidential Data for Domain Adaptation of Neural Machine Translation
Sohyung Kim
|
Arianna Bisazza
|
Fatih Turkmen
Proceedings of the Third Workshop on Privacy in Natural Language Processing
We study the problem of domain adaptation in Neural Machine Translation (NMT) when domain-specific data cannot be shared due to confidentiality or copyright issues. As a first step, we propose to fragment data into phrase pairs and use a random sample to fine-tune a generic NMT model instead of the full sentences. Despite the loss of long segments for the sake of confidentiality protection, we find that NMT quality can considerably benefit from this adaptation, and that further gains can be obtained with a simple tagging technique.
Search
Fix author
Co-authors
- Arianna Bisazza 1
- Sohyung Kim 1
- Ali Satvaty 1
- Dan Seidel 1
- Suzan Verberne 1
- show all...