Yiheng Zhao

2026

Mitigating Causal Bias in LLMs via Potential Outcomes Framework and Actual Causality Theory
Yiheng Zhao | Yuanliang Li | Shreya Savant | Jun Yan
Findings of the Association for Computational Linguistics: EACL 2026

Event Causality Identification (ECI) aims to identify causal relationships between events, which is essential for root cause analysis. While recent studies reveal that Large Language Models (LLMs) exhibit significant causal hallucination, a systematic evaluation of their document-level ECI performance across varied structural characteristics and a corresponding dataset is currently lacking. To fill this gap, we first construct a structure-controlled dataset to comprehensively assess their document-level ECI performance across texts with various structural characteristics that influence the causal behaviors in ECI. We find that different LLMs exhibit divergent causal bias across texts with varied structures, ranging from consistent hallucination or neglect to structure-dependent shifts between the two. To mitigate the bias, furthermore, we formulate ECI as a causal inference problem and propose a causality identification framework grounded in the potential outcomes and the Halpern–Pearl (HP) definition of actual causality theory. Experimental results demonstrate that our framework significantly reduces the causal bias associated with directly using LLMs on ECI, while also achieving superior performance.

2025

pdf bib abs

Is OpenVLA Truly Robust? A Systematic Evaluation of Positional Robustness
Yiran Pang | Yiheng Zhao | Zhuopu Zhou | Tingkai Hu | Ranxin Hou
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Pretrained language and vision-language models have become core components in building vision-language-action models (VLAs) due to their strong spatial reasoning capabilities. Evaluating the robustness of VLAs is crucial to ensuring their reliability in practical scenarios. Although prior work has focused on background and environment robustness, positional robustness remains underexplored. In this paper, we propose a comprehensive evaluation protocol to assess the positional robustness of VLAs and apply it to OpenVLA, an open-source, high-performing, and efficient model well suited for real-world deployment. We find that OpenVLA succeeds only when the target object is placed at one of the two positions encountered during training. Even in these cases, the success rate never exceeds 50% because it exhibits a memorized behavior that it randomly executes a grasping action toward one of the two fixed positions without relying on perception to localize the target object. This reveals that OpenVLA’s positional robustness is extremely weak.

Co-authors

Jun Yan 1

Zhuopu Zhou 1

Venues

Fix author