Reinforcement Learning–Guided Adaptive Tuning for Out-of-Distribution Harmful Text Detection
Mengyu Xiang, Tinghao Chen, Boxu Han, Qiudan Li, Shu Wu, Daniel Dajun Zeng
Abstract
As social media grows, harmful information spreads rapidly across platforms and evolves over time, showing cross-platform and cross-temporal variations. Existing methods rely on fixed model parameters during training, which fail to handle substantial semantic discrepancies, leading to Out-Of-Distribution (OOD) problems. While test-time tuning enables dynamic parameter adjustment, it may lead to excessive adaptation to individual samples. The key challenge is how to adapt to semantic variations during testing while preventing overfitting from continuous tuning. To tackle this issue, this paper proposes RLAT, a reinforcement learning (RL)–guided adaptive tuning method for harmful text detection. First, a tuning joint optimization module is designed to update parameters and adapt to semantic variations during testing. It tunes the model by optimizing consistency loss and applying word-level attention constraints to reduce over-reliance on local words and learn a more robust global representation. Then, to mitigate overfitting caused by continuous tuning, a RL–guided adaptive decision model is introduced to direct the tuning process. It reduces the influence of local samples by selecting data and controlling parameter updates, thereby improving overall test performance. Experimental results show that the RLAT outperforms state-of-the-art baselines in cross-platform and cross-temporal scenarios across multiple public datasets.- Anthology ID:
- 2026.acl-long.1623
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 35158–35174
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1623/
- DOI:
- Cite (ACL):
- Mengyu Xiang, Tinghao Chen, Boxu Han, Qiudan Li, Shu Wu, and Daniel Dajun Zeng. 2026. Reinforcement Learning–Guided Adaptive Tuning for Out-of-Distribution Harmful Text Detection. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 35158–35174, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Reinforcement Learning–Guided Adaptive Tuning for Out-of-Distribution Harmful Text Detection (Xiang et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1623.pdf