Bruce Qin

2026

In recent years, there has been a surge of interest in Cultural NLP, with substantial efforts to create globally inclusive NLP systems. The rapid growth of literature in this field makes it difficult to track trends in methods and data resources. To address this, we survey over 375 papers to answer three complementary questions: (1) What Cultural Capabilities (CCs) are being targeted in NLP systems? (2) How are cultural data resources being created? and (3) What methods are being used to improve the CCs of those systems? We discuss trends observed across the three questions, and identify relevant research gaps. To facilitate further research in this field, we release our full list of surveyed papers, in the form of an interactive web interface, CultureMine, which includes a feature to allow researchers to add their work; we hope this facilitates future research and proves to be a valuable resource for the Cultural NLP community.

pdf bib abs

Iterative Dual-Model Alignment for Story Evaluation
Bruce Qin | Dan Goldwasser
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large language models (LLMs) can both evaluate and explain text quality; however, most existing evaluators operate as static classifiers and lack the ability to refine their reasoning through interaction. We propose an Iterative Alpha–Beta Learning framework that jointly trains two complementary 8B models: an Alpha (𝛼) classifier that assesses pairwise story engagement, and a Beta (𝛽) generator that produces structured, rubric-guided comparative explanations. The two models co-evolve within a closed feedback loop: 𝛼 provides probabilistic preference signals to guide 𝛽’s Direct Preference Optimization (DPO), while 𝛽’s improved explanations are reintegrated to retrain 𝛼 via a KL-based contrastive objective. This dual optimization enables mutual learning: 𝛼 gains interpretability and robustness from 𝛽’s textual rationales, while 𝛽 acquires stronger alignment and discriminative precision from 𝛼’s confidence deltas. Experiments on human-annotated story-pair datasets HANNA show that the proposed system consistently outperforms strong single-model baselines in both accuracy and explanation quality across multiple iterative rounds.

Co-authors

Zhaoqing Wu 1

Venues

Fix author