Jiaxin Yuan
Also published as: 佳欣 袁
2026
Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings
Aakriti Agrawal | Gouthaman KV | Rohith Aralikatti | Gauri Jagatap | Jiaxin Yuan | Sarvesh Baskar | Vijay Kamarshi | Andrea Fanelli | Furong Huang
Findings of the Association for Computational Linguistics: ACL 2026
Aakriti Agrawal | Gouthaman KV | Rohith Aralikatti | Gauri Jagatap | Jiaxin Yuan | Sarvesh Baskar | Vijay Kamarshi | Andrea Fanelli | Furong Huang
Findings of the Association for Computational Linguistics: ACL 2026
Hallucinations in Large Vision-Language Models (LVLMs) remain a persistent challenge, often stemming from inadequate integration of visual information during multimodal reasoning. A key cause is the model’s over-reliance on textual priors and underutilization of visual cues, leading to outputs that are linguistically fluent but visually inaccurate. For example, given an image of an empty kitchen countertop, an LVLM might hallucinate a “bowl of fruit” or “cup of coffee,” relying on language associations rather than visual evidence. Most LVLMs incorporate visual features by appending them to the input stream of a pre-trained LLM and training on large-scale vision-language datasets. Our systematic analysis reveals that this strategy often leads to over-dependence on textual information due to the inherent bias of LLMs towards language-dominant representations. This imbalance skews attention towards the text over visual content, weakening the model’s ability to ground outputs in visual inputs. To address this, we propose a simple yet effective visual feature incorporation method that encourages the model to learn visually-informed textual embeddings distinct from those of the base LLM and promotes a more balanced attention distribution. Experimental results across multiple hallucination benchmarks demonstrate that our method significantly reduces hallucinations and fosters more balanced multimodal reasoning. Notably, our approach achieves substantial gains, including +9.33% on MMVP-MLLM, +2.99% on POPE-AOKVQA, up to +3.4% on Merlin, and +3% on the hard-data split of HallusionBench.
Beyond Self-Report: Bridging the Intention-Behavior Gap in Critical Thinking Assessment via Interpretable Multi-Agent System
Zekun Li | Jifan Yu | Haoxuan Li | Ye He | Daniel Zhang-Li | Shangqing Tu | Joy Jia Yin Lim | Yikun Jiang | Jiaxin Yuan | Yu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zekun Li | Jifan Yu | Haoxuan Li | Ye He | Daniel Zhang-Li | Shangqing Tu | Joy Jia Yin Lim | Yikun Jiang | Jiaxin Yuan | Yu Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Accurate assessment of critical thinking is historically limited by the Intention Behavior Gap in psychology: the disconnect between what individuals self-reported disposition and their actual practical behaviors. We try to bridge this gap with MASA (Multi-Agent Scenario-based Assessment), a framework that operationalizes cognitive assessment into an interpretable and interactive multi-agent workflow with Assessment Chain-of-Thought (AsCoT). Validating on both large-scale simulations (N=1,161) and human participants (N=70), we find that MASA aligns better with human expert ratings (r=0.882) than traditional gold-standard inventories (r=0.720), with an average cost of only 0.41 per participant. These results suggest that by shifting from self-report inventory to behavior-grounded dialogue, MASA offers a more accurate, cost-effective, and transparent solution for real-world cognitive evaluation.
2025
Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey
Xiaoyu Liu | Paiheng Xu | Junda Wu | Jiaxin Yuan | Yifan Yang | Yuhang Zhou | Fuxiao Liu | Tianrui Guan | Haoliang Wang | Tong Yu | Julian McAuley | Wei Ai | Furong Huang
Findings of the Association for Computational Linguistics: NAACL 2025
Xiaoyu Liu | Paiheng Xu | Junda Wu | Jiaxin Yuan | Yifan Yang | Yuhang Zhou | Fuxiao Liu | Tianrui Guan | Haoliang Wang | Tong Yu | Julian McAuley | Wei Ai | Furong Huang
Findings of the Association for Computational Linguistics: NAACL 2025
Causal inference has demonstrated significant potential to enhance Natural Language Processing (NLP) models in areas such as predictive accuracy, fairness, robustness, and explainability by capturing causal relationships among variables. The rise of generative Large Language Models (LLMs) has greatly impacted various language processing tasks. This survey focuses on research that evaluates or improves LLMs from a causal view in the following areas: reasoning capacity, fairness and safety issues, explainability, and handling multimodality. Meanwhile, LLMs can assist in causal inference tasks, such as causal relationship discovery and causal effect estimation, by leveraging their generation ability and knowledge learned during pre-training. This review explores the interplay between causal inference frameworks and LLMs from both perspectives, emphasizing their collective potential to further the development of more advanced and robust artificial intelligence systems.
On LLM-Based Scientific Inductive Reasoning Beyond Equations
Brian S. Lin | Jiaxin Yuan | Zihan Zhou | Shouli Wang | Shuo Wang | Cunliang Kong | Qi Shi | Yuxuan Li | Liner Yang | Zhiyuan Liu | Maosong Sun
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Brian S. Lin | Jiaxin Yuan | Zihan Zhou | Shouli Wang | Shuo Wang | Cunliang Kong | Qi Shi | Yuxuan Li | Liner Yang | Zhiyuan Liu | Maosong Sun
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
As large language models (LLMs) increasingly exhibit human-like capabilities, a fundamental question emerges: How can we enable LLMs to learn the underlying patterns from limited examples in entirely novel environments and apply them effectively? This question is central to the ability of LLMs in inductive reasoning. Existing research on LLM-based inductive reasoning can be broadly categorized based on whether the underlying rules are expressible via explicit mathematical equations. However, many recent studies in the beyond-equations category have emphasized rule design without grounding them in specific scenarios. Inspired by the parallels between inductive reasoning and human scientific discovery, we propose the task of LLM-Based Scientific Inductive Reasoning Beyond Equations and introduce a new benchmark, SIRBench-V1, to evaluate the inductive reasoning abilities of LLMs in scientific settings. Our experimental results show that current LLMs still struggle with this task, underscoring its difficulty and the need for further advancement in this area.
2022
COMPILING: A Benchmark Dataset for Chinese Complexity Controllable Definition Generation
Jiaxin Yuan | Cunliang Kong | Chenhui Xie | Liner Yang | Erhong Yang
Proceedings of the 21st Chinese National Conference on Computational Linguistics
Jiaxin Yuan | Cunliang Kong | Chenhui Xie | Liner Yang | Erhong Yang
Proceedings of the 21st Chinese National Conference on Computational Linguistics
“The definition generation task aims to generate a word’s definition within a specific context automatically. However, owing to the lack of datasets for different complexities, the definitions produced by models tend to keep the same complexity level. This paper proposes a novel task of generating definitions for a word with controllable complexity levels. Correspondingly, we introduce COMPILING, a dataset given detailed information about Chinese definitions, and each definition is labeled with its complexity levels. The COMPILING dataset includes 74,303 words and 106,882 definitions. To the best of our knowledge, it is the largest dataset of the Chinese definition generation task. We select various representative generation methods as baselines for this task and conduct evaluations, which illustrates that our dataset plays an outstanding role in assisting models in generating different complexity-level definitions. We believe that the COMPILING dataset will benefit further research in complexity controllable definition generation.”
2020
汉语学习者依存句法树库构建(Construction of a Treebank of Learner Chinese)
Jialu Shi (师佳璐) | Xinyu Luo (罗昕宇) | Liner Yang (杨麟儿) | Dan Xiao (肖丹) | Zhengsheng Hu (胡正声) | Yijun Wang (王一君) | Jiaxin Yuan (袁佳欣) | Yu Jingsi (余婧思) | Erhong Yang (杨尔弘)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
Jialu Shi (师佳璐) | Xinyu Luo (罗昕宇) | Liner Yang (杨麟儿) | Dan Xiao (肖丹) | Zhengsheng Hu (胡正声) | Yijun Wang (王一君) | Jiaxin Yuan (袁佳欣) | Yu Jingsi (余婧思) | Erhong Yang (杨尔弘)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
汉语学习者依存句法树库为非母语者语料提供依存句法分析,可以支持第二语言教学与研究,也对面向第二语言的句法分析、语法改错等相关研究具有重要意义。然而,现有的汉语学习者依存句法树库数量较少,且在标注方面仍存在一些问题。为此,本文改进依存句法标注规范,搭建在线标注平台,并开展汉语学习者依存句法标注。本文重点介绍了数据选取、标注流程等问题,并对标注结果进行质量分析,探索二语偏误对标注质量与句法分析的影响。
Search
Fix author
Co-authors
- Liner Yang 3
- Furong Huang 2
- Cunliang Kong (孔存良) 2
- Erhong Yang 2
- Aakriti Agrawal 1
- Wei Ai 1
- Rohith Aralikatti 1
- Sarvesh Baskar 1
- Andrea Fanelli 1
- Tianrui Guan 1
- Ye He 1
- Zhengsheng Hu 1
- Gauri Jagatap 1
- Yikun Jiang 1
- Yu Jingsi 1
- Gouthaman KV 1
- Vijay Kamarshi 1
- Yuxuan Li 1
- Zekun Li 1
- Haoxuan Li 1
- Joy Jia Yin Lim 1
- Brian S. Lin 1
- Xiaoyu Liu 1
- Fuxiao Liu 1
- Zhiyuan Liu 1
- Xinyu Luo 1
- Julian McAuley 1
- Jialu Shi 1
- Qi Shi 1
- Maosong Sun (孙茂松) 1
- Shangqing Tu 1
- Haoliang Wang 1
- Yijun Wang 1
- Shouli Wang 1
- Shuo Wang 1
- Junda Wu 1
- Dan Xiao 1
- Chenhui Xie 1
- Paiheng Xu 1
- Yifan Yang 1
- Tong Yu 1
- Jifan Yu 1
- Yu Zhang 1
- Daniel Zhang-Li 1
- Yuhang Zhou (周宇航) 1
- Zihan Zhou 1