言行不一:大语言模型决策中的隐性偏见

林莘茹 林莘茹, Luyang Li, Xiangting Liu


Abstract
"大语言模型的隐性偏见会隐蔽地影响模型的决策过程,使其在应用中难以保证公平性。本文首先构建基于决策的提示数据集进行隐性偏见评估,实验结果表明性能强的大语言模型可能表现出更严重的隐性偏见。进而为了缓解模型的隐性偏见,本文探索了自我反思和模型编辑两类方法。实验发现前者有助于识别隐性偏见,但无法在回答中去偏。在模型编辑实验中通过构建纠偏数据集,得出对模型后四层进行微调可获得最佳去偏效果,这一结论显示出有限参数调整在缓解隐性偏见方面的潜力。"
Anthology ID:
2025.ccl-1.57
Volume:
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Month:
August
Year:
2025
Address:
Jinan, China
Editors:
Maosong Sun, Peiyong Duan, Zhiyuan Liu, Ruifeng Xu, Weiwei Sun
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
756–768
Language:
URL:
https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.57/
DOI:
Bibkey:
Cite (ACL):
林莘茹 林莘茹, Luyang Li, and Xiangting Liu. 2025. 言行不一:大语言模型决策中的隐性偏见. In Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025), pages 756–768, Jinan, China. Chinese Information Processing Society of China.
Cite (Informal):
言行不一:大语言模型决策中的隐性偏见 (林莘茹 et al., CCL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.57.pdf