Hoorieh Sabzevari


2025

pdf bib
NLPART at SemEval-2025 Task 4: Forgetting is harder than Learning
Hoorieh Sabzevari | Milad Molazadeh Oskuee | Tohid Abedini | Ghazal Zamaninejad | Sara Baruni | Zahra Amirmahani | Amirmohammad Salehoof
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Unlearning is a critical capability for ensuring privacy, security, and compliance in AI systems, enabling models to forget specific data while retaining overall performance. In this work, we participated in Task 4 of SemEval 2025, which focused on unlearning across three sub-tasks: (1) long-form synthetic creative documents, (2) short-form synthetic biographies containing personally identifiable information, and (3) real documents sampled from the target model’s training dataset. We conducted four experiments, employing Supervised Fine-Tuning (SFT) and Negative Preference Optimization (NPO). Despite achieving good performance on the retain set—data that the model was supposed to remember—our findings demonstrate that these techniques did not perform well on the forget set, where unlearning was required.

2024

pdf bib
eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure
Hoorieh Sabzevari | Mohammadmostafa Rostamkhani | Sauleh Eetemadi
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This study investigates the performance of the zero-shot method in classifying data using three large language models, alongside two models with large input token sizes and the two pre-trained models on legal data. Our main dataset comes from the domain of U.S. civil procedure. It includes summaries of legal cases, specific questions, potential answers, and detailed explanations for why each solution is relevant, all sourced from a book aimed at law students. By comparing different methods, we aimed to understand how effectively they handle the complexities found in legal datasets. Our findings show how well the zero-shot method of large language models can understand complicated data. We achieved our highest F1 score of 64% in these experiments.