Yuan Liu
2026
Beyond Transcription: Unified Audio Schema for Perception-Aware AudioLLMs
Linhao Zhang | Yuhan Song | Aiwei Liu | Chuhan Wu | Sijun Zhang | Wei Jia | Yuan Liu | Houfeng Wang | Zhou Xiao
Findings of the Association for Computational Linguistics: ACL 2026
Linhao Zhang | Yuhan Song | Aiwei Liu | Chuhan Wu | Sijun Zhang | Wei Jia | Yuan Liu | Houfeng Wang | Zhou Xiao
Findings of the Association for Computational Linguistics: ACL 2026
Recent Audio Large Language Models (AudioLLMs) exhibit a striking performance inversion: while excelling at complex reasoning tasks, they consistently underperform on fine-grained acoustic perception. We attribute this gap to a fundamental limitation of ASR-centric training, which provides precise linguistic targets but implicitly teaches models to suppress paralinguistic cues and acoustic events as noise. To address this, we propose Unified Audio Schema (UAS), a holistic and structured supervision framework that organizes audio information into three explicit components—Transcription, Paralinguistics, and Non-linguistic Events—within a unified JSON format. This design achieves comprehensive acoustic coverage without sacrificing the tight audio-text alignment that enables reasoning. We validate the effectiveness of this supervision strategy by applying it to both discrete and continuous AudioLLM architectures. Extensive experiments on MMSU, MMAR, and MMAU demonstrate that UAS-Audio yields consistent improvements, boosting fine-grained perception by 10.9% on MMSU over the same-size state-of-the-art models while preserving robust reasoning capabilities. Our code and model are publicly available at https://github.com/Tencent/Unified_Audio_Schema.
AEGIS: A Holistic Benchmark for Evaluating Forensic Analysis of AI-Generated Academic Images
Bo Zhang | Tzu-Yen Ma | Zichen Tang | Junpeng Ding | Zirui Wang | Yizhuo Zhao | Peilin Gao | Zijie Xi | Zixin Ding | Haiyang Sun | Haocheng Gao | Yuan Liu | Liangjia Wang | Yiling Huang | Yujie Wang | Yuyue Zhang | Ronghui Xi | Yuanze Li | Jiacheng Liu | Zhongjun Yang | Haihong E
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Bo Zhang | Tzu-Yen Ma | Zichen Tang | Junpeng Ding | Zirui Wang | Yizhuo Zhao | Peilin Gao | Zijie Xi | Zixin Ding | Haiyang Sun | Haocheng Gao | Yuan Liu | Liangjia Wang | Yiling Huang | Yujie Wang | Yuyue Zhang | Ronghui Xi | Yuanze Li | Jiacheng Liu | Zhongjun Yang | Haihong E
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We introduce AEGIS, A holistic benchmark for Evaluating forensic analysis of AI-Generated academic ImageS. Compared to existing benchmarks, AEGIS features three key advances: (1) Domain-Specific Complexity: covering seven academic categories with 39 fine-grained subtypes, exposing intrinsic forensic difficulty, where even GPT-5.1 reaches 48.80% overall performance and expert models achieve only limited localization accuracy (IoU 30.09%); (2) Diverse Forgery Simulations: modeling four prevalent academic forgery strategies across 25 generative models, with 11 yielding average forensic accuracy below 50%, showing that forensics lag behind generative advances; and (3) Multi-Dimensional Forensic Evaluation: jointly assessing detection, reasoning, and localization, revealing complementary strengths between model families, with multimodal large language models (MLLMs) at 84.74% accuracy in textual artifact recognition and expert detectors peaking at 79.54% accuracy in binary authenticity detection. By evaluating 25 leading MLLMs, nine expert models, and one unified multimodal understanding and generation model, AEGIS serves as a diagnostic testbed exposing fundamental limitations in academic image forensics.
2025
EquiBench: Benchmarking Large Language Models’ Reasoning about Program Semantics via Equivalence Checking
Anjiang Wei | Jiannan Cao | Ran Li | Hongyu Chen | Yuhui Zhang | Ziheng Wang | Yuan Liu | Thiago S. F. X. Teixeira | Diyi Yang | Ke Wang | Alex Aiken
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Anjiang Wei | Jiannan Cao | Ran Li | Hongyu Chen | Yuhui Zhang | Ziheng Wang | Yuan Liu | Thiago S. F. X. Teixeira | Diyi Yang | Ke Wang | Alex Aiken
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
As large language models (LLMs) become integral to code-related tasks, a central question emerges: Do LLMs truly understand program semantics? We introduce EquiBench, a new benchmark for evaluating LLMs through equivalence checking, i.e., determining whether two programs produce identical outputs for all possible inputs. Unlike prior code generation benchmarks, this task directly tests a model’s ability to reason about program semantics. EquiBench consists of 2400 program pairs across four languages and six categories. These pairs are generated through program analysis, compiler scheduling, and superoptimization, ensuring high-confidence labels, nontrivial difficulty, and full automation. We evaluate 19 state-of-the-art LLMs and find that in the most challenging categories, the best accuracies are 63.8% and 76.2%, only modestly above the 50% random baseline. Further analysis reveals that models often rely on syntactic similarity rather than exhibiting robust reasoning about program semantics, highlighting current limitations. Our code and dataset are publicly available at https://github.com/Anjiang-Wei/equibench
Judge and Improve: Towards a Better Reasoning of Knowledge Graphs with Large Language Models
Mo Zhiqiang | Yang Hua | Jiahui Li | Yuan Liu | Shawn Wong | Jianmin Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Mo Zhiqiang | Yang Hua | Jiahui Li | Yuan Liu | Shawn Wong | Jianmin Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Graph Neural Networks (GNNs) have shown immense potential in improving the performance of large-scale models by effectively incorporating structured relational information. However, current approaches face two key challenges: (1) achieving robust semantic alignment between graph representations and large models, and (2) ensuring interpretability in the generated outputs. To address these challenges, we propose ExGLM (Explainable Graph Language Model), a novel training framework designed to seamlessly integrate graph and language modalities while enhancing transparency. Our framework introduces two core components: (1) a graph-language synergistic alignment module, which aligns graph structures with language model to ensure semantic consistency across modalities; and (2) a judge-and-improve paradigm, which allows the language model to iteratively evaluate, refine, and prioritize responses with higher interpretability, thereby improving both performance and transparency. Extensive experiments conducted on three benchmark datasets—ogbn-arxiv, Cora, and PubMed—demonstrate that ExGLM not only surpasses existing methods in efficiency but also generates outputs that are significantly more interpretable, effectively addressing the primary limitations of current approaches.
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion
Yuan Liu | Zhongyin Zhao | Le Tian | Haicheng Wang | Xubing Ye | Yangxiu You | Zilin Yu | Chuhan Wu | Zhou Xiao | Yang Yu | Jie Zhou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yuan Liu | Zhongyin Zhao | Le Tian | Haicheng Wang | Xubing Ye | Yangxiu You | Zilin Yu | Chuhan Wu | Zhou Xiao | Yang Yu | Jie Zhou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
High-quality labeled data is essential for training accurate document conversion models, particularly in domains with complex formats such as tables, formulas, and multi-column text. However, manual annotation is both costly and time-consuming, while automatic labeling using existing models often lacks accuracy in handling such challenging scenarios. Consequently, training student models by distilling outputs from teacher models can significantly limit their performance in real-world applications. In this paper, we propose a fully automated, distillation-free framework comprising two stages for constructing high-quality document extraction datasets and models capable of handling diverse document formats and layouts. In the first stage, we introduce a method for generating large-scale, diverse synthetic data, which enables a model to extract key elements in a unified format with strong initial performance. In the second stage, we present a self-improvement approach that further adapts the model, initially trained on synthetic data, to real-world documents. Specifically, we first use the fine-tuned model to annotate real documents, then apply a suite of filtering strategies to verify annotation quality, and finally retrain the model on the verified dataset. By iteratively repeating this process, we progressively enhance both the model’s conversion capabilities and the quality of the generated data. We train a public POINTS-1.5 model to obtain POINTS-Reader, which surpasses many existing public and proprietary models of comparable or larger size. Our model will be made publicly available.
2021
Covering a sentence in form and meaning with fewer retrieved sentences
Yuan Liu | Yves Lepage
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation
Yuan Liu | Yves Lepage
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation
2018
DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications
Wei He | Kai Liu | Jing Liu | Yajuan Lyu | Shiqi Zhao | Xinyan Xiao | Yuan Liu | Yizhong Wang | Hua Wu | Qiaoqiao She | Xuan Liu | Tian Wu | Haifeng Wang
Proceedings of the Workshop on Machine Reading for Question Answering
Wei He | Kai Liu | Jing Liu | Yajuan Lyu | Shiqi Zhao | Xinyan Xiao | Yuan Liu | Yizhong Wang | Hua Wu | Qiaoqiao She | Xuan Liu | Tian Wu | Haifeng Wang
Proceedings of the Workshop on Machine Reading for Question Answering
This paper introduces DuReader, a new large-scale, open-domain Chinese machine reading comprehension (MRC) dataset, designed to address real-world MRC. DuReader has three advantages over previous MRC datasets: (1) data sources: questions and documents are based on Baidu Search and Baidu Zhidao; answers are manually generated. (2) question types: it provides rich annotations for more question types, especially yes-no and opinion questions, that leaves more opportunity for the research community. (3) scale: it contains 200K questions, 420K answers and 1M documents; it is the largest Chinese MRC dataset so far. Experiments show that human performance is well above current state-of-the-art baseline systems, leaving plenty of room for the community to make improvements. To help the community make these improvements, both DuReader and baseline systems have been posted online. We also organize a shared competition to encourage the exploration of more models. Since the release of the task, there are significant improvements over the baselines.
Search
Fix author
Co-authors
- Chuhan Wu 2
- Zhou Xiao 2
- Alex Aiken 1
- Jiannan Cao 1
- Hongyu Chen 1
- Junpeng Ding 1
- Zixin Ding 1
- Haihong E 1
- Peilin Gao 1
- Haocheng Gao 1
- Wei He 1
- Yang Hua 1
- Jianmin Huang 1
- Yiling Huang 1
- Wei Jia 1
- Yves Lepage 1
- Ran Li 1
- Jiahui Li 1
- Yuanze Li 1
- Aiwei Liu 1
- Jiacheng Liu 1
- Kai Liu 1
- Jing Liu (刘晶, 刘璟) 1
- Xuan Liu 1
- Yajuan Lyu 1
- Tzu-Yen Ma 1
- Qiaoqiao She 1
- Yuhan Song 1
- Haiyang Sun 1
- Zichen Tang 1
- Thiago S. F. X. Teixeira 1
- Le Tian 1
- Ziheng Wang 1
- Ke Wang 1
- Houfeng Wang 1
- Haicheng Wang 1
- Zirui Wang 1
- Liangjia Wang 1
- Yujie Wang 1
- Yizhong Wang 1
- Haifeng Wang 1
- Anjiang Wei 1
- Shawn Wong 1
- Hua Wu (吴华) 1
- Tian Wu 1
- Zijie Xi 1
- Ronghui Xi 1
- Xinyan Xiao 1
- Diyi Yang 1
- Zhongjun Yang 1
- Xubing Ye 1
- Yangxiu You 1
- Zilin Yu 1
- Yuhui Zhang 1
- Linhao Zhang 1
- Sijun Zhang 1
- Bo Zhang 1
- Yuyue Zhang 1
- Zhongyin Zhao 1
- Yizhuo Zhao 1
- Shiqi Zhao 1
- Mo Zhiqiang 1
- Jie Zhou 1
- Yang yu 1