Tian Lan

Other people with similar names: Tian Lan , Tian Lan


2025

pdf bib
A Mutual Information Perspective on Knowledge Graph Embedding
Jiang Li | Xiangdong Su | Zehua Duo | Tian Lan | Xiaotao Guo | Guanglai Gao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Knowledge graph embedding techniques have emerged as a critical approach for addressing the issue of missing relations in knowledge graphs. However, existing methods often suffer from limitations, including high intra-group similarity, loss of semantic information, and insufficient inference capability, particularly in complex relation patterns such as 1-N and N-1 relations. To address these challenges, we introduce a novel KGE framework that leverages mutual information maximization to improve the semantic representation of entities and relations. By maximizing the mutual information between different components of triples, such as (h, r) and t, or (r, t) and h, the proposed method improves the model’s ability to preserve semantic dependencies while maintaining the relational structure of the knowledge graph. Extensive experiments on benchmark datasets demonstrate the effectiveness of our approach, with consistent performance improvements across various baseline models. Additionally, visualization analyses and case studies demonstrate the improved ability of the MI framework to capture complex relation patterns.

pdf bib
F²Bench: An Open-ended Fairness Evaluation Benchmark for LLMs with Factuality Considerations
Tian Lan | Jiang Li | Yemin Wang | Xu Liu | Xiangdong Su | Guanglai Gao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

With the growing adoption of large language models (LLMs) in NLP tasks, concerns about their fairness have intensified. Yet, most existing fairness benchmarks rely on closed-ended evaluation formats, which diverge from real-world open-ended interactions. These formats are prone to position bias and introduce a “minimum score” effect, where models can earn partial credit simply by guessing. Moreover, such benchmarks often overlook factuality considerations rooted in historical, social, physiological, and cultural contexts, and rarely account for intersectional biases. To address these limitations, we propose F²Bench: an open-ended fairness evaluation benchmark for LLMs that explicitly incorporates factuality considerations. F²Bench comprises 2,568 instances across 10 demographic groups and two open-ended tasks. By integrating text generation, multi-turn reasoning, and factual grounding, F²Bench aims to more accurately reflect the complexities of real-world model usage. We conduct a comprehensive evaluation of several LLMs across different series and parameter sizes. Our results reveal that all models exhibit varying degrees of fairness issues. We further compare open-ended and closed-ended evaluations, analyze model-specific disparities, and provide actionable recommendations for future model development. Our code and dataset are publicly available at https://github.com/VelikayaScarlet/F2Bench.

pdf bib
McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models
Tian Lan | Xiangdong Su | Xu Liu | Ruirui Wang | Ke Chang | Jiang Li | Guanglai Gao
Findings of the Association for Computational Linguistics: ACL 2025

As large language models (LLMs) are increasingly applied to various NLP tasks, their inherent biases are gradually disclosed. Therefore, measuring biases in LLMs is crucial to mitigate its ethical risks. However, most existing bias evaluation datasets are focus on English andNorth American culture, and their bias categories are not fully applicable to other cultures. The datasets grounded in the Chinese language and culture are scarce. More importantly, these datasets usually only support single evaluation task and cannot evaluate the bias from multiple aspects in LLMs. To address these issues, we present a Multi-task Chinese Bias Evaluation Benchmark (McBE) that includes 4,077 bias evaluation instances, covering 12 single bias categories, 82 subcategories and introducing 5 evaluation tasks, providing extensive category coverage, content diversity, and measuring comprehensiveness. Additionally, we evaluate several popular LLMs from different series and with parameter sizes. In general, all these LLMs demonstrated varying degrees of bias. We conduct an in-depth analysis of results, offering novel insights into bias in LLMs.