Aly M .Kassem

Also published as: Aly M. Kassem

2025

pdf bib abs
ALPACA AGAINST VICUNA: Using LLMs to Uncover Memorization of LLMs
Aly M. Kassem | Omar Mahmoud | Niloofar Mireshghallah | Hyunwoo Kim | Yulia Tsvetkov | Yejin Choi | Sherif Saad | Santu Rana
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

In this paper, we investigate the overlooked impact of instruction-tuning on memorization in large language models (LLMs), which has largely been studied in base, pre-trained models. We propose a black-box prompt optimization method where an attacker LLM agent uncovers higher levels of memorization in a victim agent, surpassing traditional approaches that prompt the model directly with training data. Using an iterative rejection-sampling process, we design instruction-based prompts that minimize overlap with training data to avoid providing direct solutions while maximizing overlap between the victim’s output and the training data to induce memorization. Our method shows 23.7% more overlap with training data compared to state-of-the-art baselines. We explore two attack settings: an analytical approach that determines the empirical upper bound of the attack, both with and without access to responses for prompt initialization, and a practical classifier-based method for assessing memorization without access to memorized data. Our findings reveal that instruction-tuned models can expose pre-training data as much as, or more than, base models; contexts beyond the original training data can lead to leakage; and instructions generated by other LLMs open new avenues for automated attacks, which we believe require further exploration.

2022

pdf bib abs
On The Arabic Dialects’ Identification: Overcoming Challenges of Geographical Similarities Between Arabic dialects and Imbalanced Datasets
Salma Jamal | Aly M .Kassem | Omar Mohamed | Ali Ashraf
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

Arabic is one of the world’s richest languages, with a diverse range of dialects based on geographical origin. In this paper, we present a solution to tackle subtask 1 (Country-level dialect identification) of the Nuanced Arabic Dialect Identification (NADI) shared task 2022 achieving third place with an average macro F1 score between the two test sets of 26.44%. In the preprocessing stage, we removed the most common frequent terms from all sentences across all dialects, and in the modeling step, we employed a hybrid loss function approach that includes Weighted cross entropy loss and Vector Scaling(VS) Loss. On test sets A and B, our model achieved 35.68% and 17.192% Macro F1 scores, respectively.

Co-authors

Niloofar Mireshghallah 1

Venues

naacl1
wanlp1

Fix data