Fine-Grained Transfer Learning for Harmful Content Detection through Label-Specific Soft Prompt Tuning

Faeze Ghorbanpour, Viktor Hangya, Alexander Fraser


Abstract
The spread of harmful content online is a dynamic issue evolving over time. Existing detection models, reliant on static data, are becoming less effective and generalizable. Developing new models requires sufficient up-to-date data, which is challenging. A potential solution is to combine existing datasets with minimal new data. However, detection tasks vary—some focus on hate speech, offensive, or abusive content, which differ in the intent to harm, while others focus on identifying targets of harmful speech such as racism, sexism, etc.—raising the challenge of handling nuanced class differences. To address these issues, we introduce a novel transfer learning method that leverages class-specific knowledge to enhance harmful content detection. In our approach, we first present label-specific soft prompt tuning, which captures and represents class-level information. Secondly, we propose two approaches to transfer this fine-grained knowledge from source (existing tasks) to target (unseen and new tasks): initializing the target task prompts from source prompts and using an attention mechanism that learns and adjusts attention scores to utilize the most relevant information from source prompts. Experiments demonstrate significant improvements in harmful content detection across English and German datasets, highlighting the effectiveness of label-specific representations and knowledge transfer.
Anthology ID:
2025.naacl-long.551
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11047–11061
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.551/
DOI:
Bibkey:
Cite (ACL):
Faeze Ghorbanpour, Viktor Hangya, and Alexander Fraser. 2025. Fine-Grained Transfer Learning for Harmful Content Detection through Label-Specific Soft Prompt Tuning. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 11047–11061, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Fine-Grained Transfer Learning for Harmful Content Detection through Label-Specific Soft Prompt Tuning (Ghorbanpour et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.551.pdf