Meng-Chen Wu


2025

pdf bib
Incorporating Diverse Perspectives in Cultural Alignment: Survey of Evaluation Benchmarks Through A Three-Dimensional Framework
Meng-Chen Wu | Si-Chi Chin | Tess Wood | Ayush Goyal | Narayanan Sadagopan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) increasingly serve diverse global audiences, making it critical for responsible AI deployment across cultures. While recent works have proposed various approaches to enhance cultural alignment in LLMs, a systematic analysis of their evaluation benchmarks remains needed. We propose a novel framework that conceptualizes alignment along three dimensions: Cultural Group (who to align with), Cultural Elements (what to align), and Awareness Scope (how to align: majority-focused vs. diversity-aware). Through this framework, we analyze 105 cultural alignment evaluation benchmarks, revealing significant imbalances: Region (37.9%) and Language (28.9%) dominate Cultural Group representation; Social and Political Relations (25.1%) and Speech and Language (20.9%) concentrate Cultural Elements coverage; and an overwhelming majority (97.1%) of datasets adopt majority-focused Awareness Scope approaches. In a case study examining AI safety evaluation across nine Asian countries (Section 5), we demonstrate how our framework reveals critical gaps between existing benchmarks and real-world cultural biases identified in the study, providing actionable guidance for developing more comprehensive evaluation resources tailored to specific deployment contexts.

pdf bib
SEEval: Advancing LLM Text Evaluation Efficiency and Accuracy through Self-Explanation Prompting
Meng-Chen Wu | Md Mosharaf Hossain | Tess Wood | Shayan Ali Akbar | Si-Chi Chin | Erwin Cornejo
Findings of the Association for Computational Linguistics: NAACL 2025

Large language models (LLMs) have achieved remarkable success in various natural language generation (NLG) tasks, but their performance in automatic text evaluation is not yet ready as human replacements. In this paper, we propose SEEval (Self-Explanation in Evaluation), a novel prompt-based text evaluator. Inspired by educational psychology, SEEval incorporates self-explanation, a metacognitive strategy, to enhance automatic text evaluation. Our experimental results show that SEEval, without probability normalization, is able to achieve competitive and often superior performance compared to the two state-of-the-art baselines – G-Eval and Analyze-Rate – across all evaluation dimensions and is 20 times more efficient in terms of run-time. The SEEval method is also generalizable as its results are consistent across three other selected LLMs – Claude 3.5 Sonnet, Command R+, and Mistral-Large 2.