Yang Ren
2026
MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Real-world financial analysis involves information across multiple languages and modalities, from reports and news to scanned filings and meeting recordings. Yet most existing evaluations of LLMs in finance remain text-only, monolingual, and largely saturated by current models. To bridge these gaps, we present MultiFinBen, the first expert-annotated multilingual (five languages) and multimodal (text, vision, audio) benchmark for evaluating LLMs in realistic financial contexts. MultiFinBen introduces two new task families: multilingual financial reasoning, which tests cross-lingual evidence integration from filings and news, and financial OCR, which extracts structured text from scanned documents containing tables and charts. Rather than aggregating all available datasets, we apply a structured, difficulty-aware selection based on advanced model performance, ensuring balanced challenge and removing redundant tasks. Evaluating 21 leading LLMs shows that even frontier multimodal models like GPT-4o achieve only 46.01% overall, stronger on vision and audio but dropping sharply in multilingual settings. These findings expose persistent limitations in multilingual, multimodal, and expert-level financial reasoning. All datasets, evaluation scripts, and leaderboards are publicly released.
2025
GeAR: Graph-enhanced Agent for Retrieval-augmented Generation
Zhili Shen | Chenxin Diao | Pavlos Vougiouklis | Pascual Merita | Shriram Piramanayagam | Enting Chen | Damien Graux | Andre Melo | Ruofei Lai | Zeren Jiang | Zhongyang Li | Ye Qi | Yang Ren | Dandan Tu | Jeff Z. Pan
Findings of the Association for Computational Linguistics: ACL 2025
Zhili Shen | Chenxin Diao | Pavlos Vougiouklis | Pascual Merita | Shriram Piramanayagam | Enting Chen | Damien Graux | Andre Melo | Ruofei Lai | Zeren Jiang | Zhongyang Li | Ye Qi | Yang Ren | Dandan Tu | Jeff Z. Pan
Findings of the Association for Computational Linguistics: ACL 2025
Retrieval-augmented Generation (RAG) relies on effective retrieval capabilities, yet traditional sparse and dense retrievers inherently struggle with multi-hop retrieval scenarios. In this paper, we introduce GeAR, a system that advances RAG performance through two key innovations: (i) an efficient graph expansion mechanism that augments any conventional base retriever, such as BM25, and (ii) an agent framework that incorporates the resulting graph-based retrieval into a multi-step retrieval framework. Our evaluation demonstrates GeAR’s superior retrieval capabilities across three multi-hop question answering datasets. Notably, our system achieves state-of-the-art results with improvements exceeding 10% on the challenging MuSiQue dataset, while consuming fewer tokens and requiring fewer iterations than existing multi-step retrieval systems. The project page is available at https://gear-rag.github.io.
Search
Fix author
Co-authors
- Sophia Ananiadou 1
- Yupeng Cao 1
- Nuo Chen 1
- Xi Chen 1
- Enting Chen 1
- Arman Cohan 1
- Zhiyang Deng 1
- Chenxin Diao 1
- Yun Feng 1
- Heming Fu 1
- Penglei Gao 1
- Polydoros Giannouris 1
- Damien Graux 1
- Yuqing Guo 1
- Yi Han 1
- Yueru He 1
- Huan He 1
- Jerry Huang 1
- Jimin Huang 1
- Mingyang Jiang 1
- Yuechen Jiang 1
- Zeren Jiang 1
- Ruofei Lai 1
- Haohang Li 1
- Zhongyang Li 1
- Shengyuan Lin 1
- Mingquan Lin 1
- Zhiwei Liu 1
- Xiao-Yang Liu 1
- Alejandro Lopez-Lira 1
- Peng Lu 1
- André Melo 1
- Pascual Merita 1
- Jian-Yun Nie 1
- Jeff Z. Pan 1
- Triantafillos Papadopoulos 1
- Xueqing Peng 1
- Shriram Piramanayagam 1
- Ye Qi 1
- Lingfei Qian 1
- Meikang Qiu 1
- Zhili Shen 1
- Kaleb E. Smith 1
- Efstathia Soufleri 1
- Jun’ichi Tsujii 1
- Dandan Tu 1
- Pavlos Vougiouklis 1
- Yan Wang 1
- Xiaoyu Wang 1
- Keyi Wang 1
- Suyuchen Wang 1
- Ruoyu Xiang 1
- Qianqian Xie 1
- Guojun Xiong 1
- Shanshan Yang 1
- Yangyang Yu 1
- Vincent Jim Zhang 1
- Jeff Zhao 1
- Yilun Zhao 1
- Yijia Zhao 1