Ayush Goyal

2025

pdf bib abs
Incorporating Diverse Perspectives in Cultural Alignment: Survey of Evaluation Benchmarks Through A Three-Dimensional Framework
Meng-Chen Wu | Si-Chi Chin | Tess Wood | Ayush Goyal | Narayanan Sadagopan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large Language Models (LLMs) increasingly serve diverse global audiences, making it critical for responsible AI deployment across cultures. While recent works have proposed various approaches to enhance cultural alignment in LLMs, a systematic analysis of their evaluation benchmarks remains needed. We propose a novel framework that conceptualizes alignment along three dimensions: Cultural Group (who to align with), Cultural Elements (what to align), and Awareness Scope (how to align: majority-focused vs. diversity-aware). Through this framework, we analyze 105 cultural alignment evaluation benchmarks, revealing significant imbalances: Region (37.9%) and Language (28.9%) dominate Cultural Group representation; Social and Political Relations (25.1%) and Speech and Language (20.9%) concentrate Cultural Elements coverage; and an overwhelming majority (97.1%) of datasets adopt majority-focused Awareness Scope approaches. In a case study examining AI safety evaluation across nine Asian countries (Section 5), we demonstrate how our framework reveals critical gaps between existing benchmarks and real-world cultural biases identified in the study, providing actionable guidance for developing more comprehensive evaluation resources tailored to specific deployment contexts.

A critical challenge in deploying Large Language Models (LLMs) is developing reliable mechanisms to estimate their confidence, enabling systems to determine when to trust model outputs and when to seek human intervention. In this paper, we present a Calibrated Reflection Approach for Enhancing Confidence Estimation in LLMs, a framework that combines structured reasoning with distance-aware calibration techniques. Our approach introduces three key innovations: (1) a Maximum Confidence Selection (MCS) method that comprehensively evaluates confidence across all possible labels, (2) a reflection-based prompting mechanism that enhances reasoning reliability, and (3) a distance-aware calibration technique that accounts for ordinal relationships between labels. We evaluate our framework across diverse datasets, including HelpSteer2, Llama T-REx, and an internal conversational dataset, demonstrating its effectiveness across both conversational and fact-based classification tasks. This work contributes to the broader goal of developing reliable and well-calibrated confidence estimation methods for LLMs, enabling informed decisions about when to trust model outputs and when to defer to human judgement.

Co-authors

Yuan Ling 1

Narayanan Sadagopan 1

Tess Wood 1

Meng-Chen Wu 1

Venues

Fix author