Human-AI Moral Judgment Congruence on Real-World Scenarios: A Cross-Lingual Analysis

Nan Li, Bo Kang, Tijl De Bie


Abstract
As Large Language Models (LLMs) are deployed in every aspect of our lives, understanding how they reason about moral issues becomes critical for AI safety. We investigate this using a dataset we curated from Reddit’s r/AmItheAsshole, comprising real-world moral dilemmas with crowd-sourced verdicts. Through experiments on five state-of-the-art LLMs across 847 posts, we find a significant and systematic divergence where LLMs are more lenient than humans. Moreover, we find that translating the posts into another language changes LLMs’ verdicts, indicating their judgments lack cross-lingual stability.
Anthology ID:
2025.winlp-main.10
Volume:
Proceedings of the 9th Widening NLP Workshop
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Chen Zhang, Emily Allaway, Hua Shen, Lesly Miculicich, Yinqiao Li, Meryem M'hamdi, Peerat Limkonchotiwat, Richard He Bai, Santosh T.y.s.s., Sophia Simeng Han, Surendrabikram Thapa, Wiem Ben Rim
Venues:
WiNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
46–49
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.winlp-main.10/
DOI:
Bibkey:
Cite (ACL):
Nan Li, Bo Kang, and Tijl De Bie. 2025. Human-AI Moral Judgment Congruence on Real-World Scenarios: A Cross-Lingual Analysis. In Proceedings of the 9th Widening NLP Workshop, pages 46–49, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Human-AI Moral Judgment Congruence on Real-World Scenarios: A Cross-Lingual Analysis (Li et al., WiNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.winlp-main.10.pdf