YNU at SemEval-2025 Task 4: Synthetic Token Alternative Training for LLM Unlearning

Yang Chen; Zheyang Luo; Zhiwen Tang

YNU at SemEval-2025 Task 4: Synthetic Token Alternative Training for LLM Unlearning

Abstract

This paper describes our system submitted to SemEval-2025 Task 4, which introduces the Synthetic Token Alternative Training (STAT) algorithm for efficient unlearning in large language models (LLMs). The proposed method aims to enable pretrained models to selectively forget designated data (the forget set) while preserving performance on the remaining data (the retain set).The STAT framework adopts a dual-stage process. In the first stage, pseudo tokens are generated through random sampling and applied to the forget set, facilitating more effective targeted unlearning. In the second stage, the model undergoes gradient-based optimization using an alternative training scheme that alternates between pseudo-token-augmented samples from the forget set and unmodified samples from the retain set. This design promotes stable unlearning of the specified data while accelerating convergence and preserving the model’s general performance.Our system achieved 3rd place in the 7B model track (OLMo-7B) and 7th place in the 1B model track (OLMo-1B), demonstrating substantial improvements over the official baselines, exhibiting superior stability in knowledge retention and more effective targeted forgetting compared to existing approaches.

Anthology ID:: 2025.semeval-1.264
Volume:: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2038–2043
Language:
URL:: https://preview.aclanthology.org/transition-to-people-yaml/2025.semeval-1.264/
DOI:
Bibkey:
Cite (ACL):: Yang Chen, Zheyang Luo, and Zhiwen Tang. 2025. YNU at SemEval-2025 Task 4: Synthetic Token Alternative Training for LLM Unlearning. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 2038–2043, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: YNU at SemEval-2025 Task 4: Synthetic Token Alternative Training for LLM Unlearning (Chen et al., SemEval 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/transition-to-people-yaml/2025.semeval-1.264.pdf

PDF Cite Search Fix data