言行不一:大语言模型决策中的隐性偏见

林莘茹 林莘茹; Luyang Li; Xiangting Liu

言行不一:大语言模型决策中的隐性偏见

Abstract

"大语言模型的隐性偏见会隐蔽地影响模型的决策过程,使其在应用中难以保证公平性。本文首先构建基于决策的提示数据集进行隐性偏见评估,实验结果表明性能强的大语言模型可能表现出更严重的隐性偏见。进而为了缓解模型的隐性偏见,本文探索了自我反思和模型编辑两类方法。实验发现前者有助于识别隐性偏见,但无法在回答中去偏。在模型编辑实验中通过构建纠偏数据集,得出对模型后四层进行微调可获得最佳去偏效果,这一结论显示出有限参数调整在缓解隐性偏见方面的潜力。"

Anthology ID:: 2025.ccl-1.57
Volume:: Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Month:: August
Year:: 2025
Address:: Jinan, China
Editors:: Maosong Sun, Peiyong Duan, Zhiyuan Liu, Ruifeng Xu, Weiwei Sun
Venue:: CCL
SIG:
Publisher:: Chinese Information Processing Society of China
Note:
Pages:: 756–768
Language:
URL:: https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.57/
DOI:
Bibkey:
Cite (ACL):: 林莘茹林莘茹, Luyang Li, and Xiangting Liu. 2025. 言行不一:大语言模型决策中的隐性偏见. In Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025), pages 756–768, Jinan, China. Chinese Information Processing Society of China.
Cite (Informal):: 言行不一:大语言模型决策中的隐性偏见 (林莘茹 et al., CCL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.57.pdf

PDF Cite Search Fix data