Xi Yang
Other people with similar names: Xi Yang
Unverified author pages with similar names: Xi Yang
2026
EmotionTalk: An Interactive Chinese Multimodal Emotion Dataset With Rich Annotations
Haoqin Sun | Jinghua Zhao | Xuechen Wang | Shiwan Zhao | Jiaming Zhou | Hui Wang | Xi Yang | Yequan Wang | Yonghua Lin
Findings of the Association for Computational Linguistics: ACL 2026
Haoqin Sun | Jinghua Zhao | Xuechen Wang | Shiwan Zhao | Jiaming Zhou | Hui Wang | Xi Yang | Yequan Wang | Yonghua Lin
Findings of the Association for Computational Linguistics: ACL 2026
The advancement of Multimodal Emotion Recognition (MER) in Chinese is significantly hindered by the scarcity of high-quality, spontaneous dialogue datasets compared to their English counterparts. In this work, we introduce EmotionTalk, the first interactive Chinese multimodal dataset designed to capture the nuance of authentic emotional interplay. Collected from 19 professional actors, the dataset spans 23.6 hours of dyadic conversations across diverse scenarios. A key contribution of EmotionTalk is its multi-grained annotation system, which integrates standard categorical and dimensional labels with fine-grained emotional speaking style captions, enabling research into interpretable emotion analysis. We establish comprehensive benchmarks for emotion recognition and captioning tasks, verifying the dataset’s effectiveness and the necessity of multimodal fusion. EmotionTalk serves as a critical resource for bridging the gap in non-English affective computing and is publicly released for the research community.
LaoBench: A Large-Scale Multidimensional Lao Benchmark for Large Language Models
Jian Gao | Richeng Xuan | Zhaolu Kang | Dingshi Liao | Wenxin Huang | Zongmou Huang | Yangdi Xu | Bowen Qin | Zheqi He | Xi Yang | Changjinli | Yonghua Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jian Gao | Richeng Xuan | Zhaolu Kang | Dingshi Liao | Wenxin Huang | Zongmou Huang | Yangdi Xu | Bowen Qin | Zheqi He | Xi Yang | Changjinli | Yonghua Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The rapid advancement of large language models (LLMs) has not been matched by their evaluation in low-resource languages, especially Southeast Asian languages like Lao. To fill this gap, we introduce LaoBench, the first large-scale, high-quality, and multidimensional benchmark for assessing LLM language understanding and reasoning in Lao. LaoBench contains 17,000+ expert-curated samples across three dimensions: culturally grounded knowledge application, curriculum-aligned K12 education, and bilingual translation among Lao, Chinese, and English. It includes open-source and held-out subsets, where the held-out portion enables secure black-box evaluation via a controlled service to improve fairness and data security. We construct LaoBench with a hybrid pipeline that combines expert authoring with agent-assisted verification, ensuring linguistic accuracy, cultural relevance, and educational validity. We evaluate diverse state-of-the-art open-source and closed-source LLMs, and find that even strong multilingual models lag behind human experts, particularly in culturally grounded reasoning and translation fidelity. We hope LaoBench will catalyze research on Lao and other underrepresented Southeast Asian languages for more inclusive multilingual evaluation.
2025
FlagEval-Arena: A Side-by-Side Comparative Evaluation Platform for Large Language Models and Text-Driven AIGC
Jing-Shu Zheng | Richeng Xuan | Bowen Qin | Zheqi He | Tongshuai.ren Tongshuai.ren | Xuejing Li | Jin-Ge Yao | Xi Yang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Jing-Shu Zheng | Richeng Xuan | Bowen Qin | Zheqi He | Tongshuai.ren Tongshuai.ren | Xuejing Li | Jin-Ge Yao | Xi Yang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
We introduce FlagEval-Arena, an evaluation platform for side-by-side comparisons of large language models and text-driven AIGC systems.Compared with the well-known LM Arena (LMSYS Chatbot Arena), we reimplement our own framework with the flexibility to introduce new mechanisms or features. Our platform enables side-by-side evaluation not only for language models or vision-language models, but also text-to-image or text-to-video synthesis. We specifically target at Chinese audience with a more focus on the Chinese language, more models developed by Chinese institutes, and more general usage beyond the technical community. As a result, we currently observe very interesting differences from usual results presented by LM Arena. Our platform is available via this URL: https://flageval.baai.org/#/arena.
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
Zheqi He | Yesheng Liu | Jing-Shu Zheng | Xuejing Li | Jin-Ge Yao | Bowen Qin | Richeng Xuan | Xi Yang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Zheqi He | Yesheng Liu | Jing-Shu Zheng | Xuejing Li | Jin-Ge Yao | Bowen Qin | Richeng Xuan | Xi Yang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
We present FlagEvalMM, an open-source evaluation framework designed to comprehensively assess multimodal models across a diverse range of vision-language understanding and generation tasks, such as visual question answering, text-to-image/video generation, and image-text retrieval. We decouple model inference from evaluation through an independent evaluation service, thus enabling flexible resource allocation and seamless integration of new tasks and models. Moreover, FlagEvalMM utilizes advanced inference acceleration tools (e.g., vLLM, SGLang) and asynchronous data loading to significantly enhance evaluation efficiency. Extensive experiments show that FlagEvalMM offers accurate and efficient insights into model strengths and limitations, making it a valuable tool for advancing multimodal research. The framework is publicly accessible at https://github.com/flageval-baai/FlagEvalMM, with a demonstration video available at https://youtu.be/L7EtacjoM0k.
ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5
Jiaming Zhou | Shiyao Wang | Shiwan Zhao | Jiabei He | Haoqin Sun | Hui Wang | Cheng Liu | Aobo Kong | Yujie Guo | Xi Yang | Yequan Wang | Yonghua Lin | Yong Qin
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiaming Zhou | Shiyao Wang | Shiwan Zhao | Jiabei He | Haoqin Sun | Hui Wang | Cheng Liu | Aobo Kong | Yujie Guo | Xi Yang | Yequan Wang | Yonghua Lin | Yong Qin
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Automatic speech recognition (ASR) systems have advanced significantly with models like Whisper, Conformer, and self-supervised frameworks such as Wav2vec 2.0 and HuBERT. However, developing robust ASR models for young children’s speech remains challenging due to differences in pronunciation, tone, and pace compared to adult speech. In this paper, we introduce a new Mandarin speech dataset focused on children aged 3 to 5, addressing the scarcity of resources in this area. The dataset comprises 41.25 hours of speech with carefully crafted manual transcriptions, collected from 397 speakers across various provinces in China, with balanced gender representation. We provide a comprehensive analysis of speaker demographics, speech duration distribution and geographic coverage. Additionally, we evaluate ASR performance on models trained from scratch, such as Conformer, as well as fine-tuned pre-trained models like HuBERT and Whisper, where fine-tuning demonstrates significant performance improvements. Furthermore, we assess speaker verification (SV) on our dataset, showing that, despite the challenges posed by the unique vocal characteristics of young children, the dataset effectively supports both ASR and SV tasks. This dataset is a valuable contribution to Mandarin child speech research and holds potential for applications in educational technology and child-computer interaction. It will be open-source and freely available for all academic purposes.
Search
Fix author
Co-authors
- Zheqi He 3
- Yonghua Lin 3
- Bowen Qin 3
- Richeng Xuan 3
- Xuejing Li 2
- Haoqin Sun 2
- Hui Wang 2
- Yequan Wang 2
- Jin-ge Yao 2
- Shiwan Zhao 2
- Jing-Shu Zheng 2
- Jiaming Zhou 2
- Changjinli 1
- Jian Gao 1
- Yujie Guo 1
- Jiabei He 1
- Wenxin Huang 1
- Zongmou Huang 1
- Zhaolu Kang 1
- Aobo Kong 1
- Dingshi Liao 1
- Yesheng Liu 1
- Cheng Liu 1
- Yong Qin 1
- Tongshuai.ren Tongshuai.ren 1
- Xuechen Wang 1
- Shiyao Wang 1
- Yangdi Xu 1
- Jinghua Zhao 1