Kun Zhang

Papers on this page may belong to the following people: Kun Zhang, Kun Zhang, Kun Zhang (Inria Saclay-Île-de-France), Kun Zhang (University of Chinese Academy of Sciences), Kun Zhang (University of Science and Technology of China)

2025

pdf bib abs
Unmasking Style Sensitivity: A Causal Analysis of Bias Evaluation Instability in Large Language Models
Jiaxu Zhao | Meng Fang | Kun Zhang | Mykola Pechenizkiy
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Natural language processing applications are increasingly prevalent, but social biases in their outputs remain a critical challenge. While various bias evaluation methods have been proposed, these assessments show unexpected instability when input texts undergo minor stylistic changes. This paper conducts a comprehensive analysis of how different style transformations impact bias evaluation results across multiple language models and bias types using causal inference techniques. Our findings reveal that formality transformations significantly affect bias scores, with informal style showing substantial bias reductions (up to 8.33% in LLaMA-2-13B). We identify appearance bias, sexual orientation bias, and religious bias as most susceptible to style changes, with variations exceeding 20%. Larger models demonstrate greater sensitivity to stylistic variations, with bias measurements fluctuating up to 3.1% more than in smaller models. These results highlight critical limitations in current bias evaluation methods and emphasize the need for reliable and fair assessments of language models.

pdf bib abs
Structured Discourse Representation for Factual Consistency Verification
Kun Zhang | Oana Balalau | Ioana Manolescu
Findings of the Association for Computational Linguistics: ACL 2025

Analysing the differences in how events are represented across texts, or verifying whether the language model generations hallucinate, requires the ability to systematically compare their content. To support such comparison, structured representation that captures fine-grained information plays a vital role.In particular, identifying distinct atomic facts and the discourse relations connecting them enables deeper semantic comparison. Our proposed approach combines structured discourse information extraction with a classifier, FDSpotter, for factual consistency verification. We show that adversarial discourse relations pose challenges for language models, but fine-tuning on our annotated data, DiscInfer, achieves competitive performance. Our proposed approach advances factual consistency verification by grounding in linguistic structure and decomposing it into interpretable components. We demonstrate the effectiveness of our method on the evaluation of two tasks: data-to-text generation and text summarisation. Our code and dataset will be publicly available on GitHub.

pdf bib abs
TIU-Bench: A Benchmark for Evaluating Large Multimodal Models on Text-rich Image Understanding
Kun Zhang | Liqiang Niu | Zhen Cao | Fandong Meng | Jie Zhou
Findings of the Association for Computational Linguistics: EMNLP 2025

Text-rich images are ubiquitous in real-world applications, serving as a critical medium for conveying complex information and facilitating accessibility.Despite recent advances driven by Multimodal Large Language Models (MLLMs), existing benchmarks suffer from limited scale, fragmented scenarios, and evaluation protocols that fail to fully capture holistic image understanding.To address these gaps, we present TIU-Bench, a large-scale, multilingual benchmark comprising over 100,000 full-image annotations and 22,000 rigorously validated question-answer (QA) pairs that span 18 subtasks across diverse real-world scenarios.TIU-Bench introduces a novel full-image structured output format that jointly models geometric, textual, and relational information, enabling fine-grained evaluation of perception and reasoning capabilities. Furthermore, we propose a two-stage understanding framework named T2TIU, which first generates a structured representation of the entire image and subsequently conducts reasoning on this representation to address complex visual-textual queries.Extensive experiments on 10 state-of-the-art generative models highlight the challenges and opportunities in advancing text-rich image understanding.Our benchmark and framework provide a comprehensive platform for developing and evaluating next-generation multimodal AI systems.

While several previous studies have analyzed gender bias in research, we are still missing a comprehensive analysis of gender differences in the AI community, covering diverse topics and different development trends. Using the AI Scholar dataset of 78K researchers in the field of AI, we identify several gender differences: (1) Although female researchers tend to have fewer overall citations than males, this citation difference does not hold for all academic-age groups; (2) There exist large gender homophily in co-authorship on AI papers; (3) Female first-authored papers show distinct linguistic styles, such as longer text, more positive emotion words, and more catchy titles than male first-authored papers. Our analysis provides a window into the current demographic trends in our AI community, and encourages more gender equality and diversity in the future.

2023

pdf bib abs
Uncertainty Guided Label Denoising for Document-level Distant Relation Extraction
Qi Sun | Kun Huang | Xiaocui Yang | Pengfei Hong | Kun Zhang | Soujanya Poria
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Document-level relation extraction (DocRE) aims to infer complex semantic relations among entities in a document. Distant supervision (DS) is able to generate massive auto-labeled data, which can improve DocRE performance. Recent works leverage pseudo labels generated by the pre-denoising model to reduce noise in DS data. However, unreliable pseudo labels bring new noise, e.g., adding false pseudo labels and losing correct DS labels. Therefore, how to select effective pseudo labels to denoise DS data is still a challenge in document-level distant relation extraction. To tackle this issue, we introduce uncertainty estimation technology to determine whether pseudo labels can be trusted. In this work, we propose a Document-level distant Relation Extraction framework with Uncertainty Guided label denoising, UGDRE. Specifically, we propose a novel instance-level uncertainty estimation method, which measures the reliability of the pseudo labels with overlapping relations. By further considering the long-tail problem, we design dynamic uncertainty thresholds for different types of relations to filter high-uncertainty pseudo labels. We conduct experiments on two public datasets. Our framework outperforms strong baselines by 1.91 F1 and 2.28 Ign F1 on the RE-DocRED dataset.

2022

pdf bib abs
CausalNLP Tutorial: An Introduction to Causality for Natural Language Processing
Zhijing Jin | Amir Feder | Kun Zhang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

Causal inference is becoming an increasingly important topic in deep learning, with the potential to help with critical deep learning problems such as model robustness, interpretability, and fairness. In addition, causality is naturally widely used in various disciplines of science, to discover causal relationships among variables and estimate causal effects of interest. In this tutorial, we introduce the fundamentals of causal discovery and causal effect estimation to the natural language processing (NLP) audience, provide an overview of causal perspectives to NLP problems, and aim to inspire novel approaches to NLP further. This tutorial is inclusive to a variety of audiences and is expected to facilitate the community’s developments in formulating and addressing new, important NLP problems in light of emerging causal principles and methodologies.

Aspect-based sentiment analysis (ABSA) predicts sentiment polarity towards a specific aspect in the given sentence. While pre-trained language models such as BERT have achieved great success, incorporating dynamic semantic changes into ABSA remains challenging. To this end, in this paper, we propose to address this problem by Dynamic Re-weighting BERT (DR-BERT), a novel method designed to learn dynamic aspect-oriented semantics for ABSA. Specifically, we first take the Stack-BERT layers as a primary encoder to grasp the overall semantic of the sentence and then fine-tune it by incorporating a lightweight Dynamic Re-weighting Adapter (DRA). Note that the DRA can pay close attention to a small region of the sentences at each step and re-weigh the vitally important words for better aspect-aware sentiment understanding. Finally, experimental results on three benchmark datasets demonstrate the effectiveness and the rationality of our proposed model and provide good interpretable insights for future semantic modeling.