Xianling Mao

Also published as: 先领


2026

SemEval-2026 Task 7 evaluates the ability of Large Language Models (LLMs) to reason about diverse daily knowledge across 30 geographic regions. In this paper, team uir-cis-7 approaches this challenge not merely as an accuracy optimization problem, but as a diagnostic probe to evaluate the representational limits of LLMs without fine-tuning. To address Western-centric bias and the "overthinking penalty" frequently observed in high-resource contexts, we introduce a Two-Tier Dynamic Routing framework. Based on cultural resource density, queries are routed either to a direct-answer pathway or a complex reasoning pathway. The complex pathway utilizes an Anti-Bias Persona-Conditioned Chain-of-Thought enhanced with Knowledge Anchoring and multi-path Self-Consistency voting to mitigate majority-culture heuristics. Evaluated using a strict macro-average metric, our system achieved an overall accuracy of 89.02% on the official leaderboard. Our fine-grained evaluation and theoretical error analysis quantify the epistemological boundaries of prompt-based alignment, proving our dynamic strategy effectively rescues marginalized cultural knowledge while exposing persistent instances where safety-aligned models project Western progressive norms onto traditional contexts. Furthermore, cross-model validation on open-source architectures explicitly confirms our framework’s generalizability.

2024

“人工评估,作为生成式文本质量评价的金标准,成本太高;自动评估,核心思想在于要使其评估结果与人工评估高度相关,从而实现对生成式文本质量的自动化分析和评价。随着自然语言处理领域相关技术的迭代进步,使得生成式文本质量的自动评估技术,已然经历了多次技术范式的迭代。然而,学界至今依然缺乏对生成式文本质量自动评估技术的系统化总结。因此,本文将首先系统地对已有的生成式文本自动评估方法进行归纳总结,然后分析了生成式文本自动评估方法的主要发展趋势,最后为了使读者更加宏观地了解自动评估整体,对自动评估领域整体的未来研究方向进行了探讨和展望。”

2020

Multilingual pretrained language models (such as multilingual BERT) have achieved impressive results for cross-lingual transfer. However, due to the constant model capacity, multilingual pre-training usually lags behind the monolingual competitors. In this work, we present two approaches to improve zero-shot cross-lingual classification, by transferring the knowledge from monolingual pretrained models to multilingual ones. Experimental results on two cross-lingual classification benchmarks show that our methods outperform vanilla multilingual fine-tuning.