Understanding and Mitigating Political Stance Cross-topic Generalization in Large Language Models

Jiayi Zhang; Shu Yang; Junchao Wu; Derek F. Wong (黄辉); Di Wang

Understanding and Mitigating Political Stance Cross-topic Generalization in Large Language Models

Jiayi Zhang, Shu Yang, Junchao Wu, Derek F. Wong, Di Wang

Abstract

Fine-tuning Large Language Models on a political topic will significantly manipulate their political stance on various issues and unintentionally affect their stance on broad topics. While previous studies have proposed this issue, there is still a lack of understanding regarding the internal representations of these stances and the mechanisms that lead to unintended cross-topic generalization. In this paper, we systematically explore the internal mechanisms underlying this phenomenon from a neuron-level perspective and how to mitigate the cross-topic generalization of political fine-tuning. Firstly, we propose Political Neuron Localization through Activation Contrasting (PNLAC) to identify two distinct types of political neurons: general political neurons, which govern stance across multiple political topics, and topic-specific neurons that affect the model’s political stance on individual topics. We find that these political neuron types exist in the middle and later layers across four models and datasets through activation patching experiments. Leveraging these insights, we introduce InhibitFT, an inhibition-based fine-tuning method that effectively mitigates the cross-topic stance generalization. Experimental results demonstrate the robustness of the identified neuron types across various models and datasets and show that InhibitFT significantly reduces the cross-topic stance generalization by 20% on average while preserving topic-specific performance. Moreover, we demonstrate that selectively inhibiting only 5% of neurons is sufficient to effectively mitigate the cross-topic stance generalization.

Anthology ID:: 2026.acl-long.1374
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29775–29797
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1374/
DOI:
Bibkey:
Cite (ACL):: Jiayi Zhang, Shu Yang, Junchao Wu, Derek F. Wong, and Di Wang. 2026. Understanding and Mitigating Political Stance Cross-topic Generalization in Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 29775–29797, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Understanding and Mitigating Political Stance Cross-topic Generalization in Large Language Models (Zhang et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1374.pdf
Checklist:: 2026.acl-long.1374.checklist.pdf

PDF Cite Search Checklist Fix data