Lingfei Qian
2026
MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Real-world financial analysis involves information across multiple languages and modalities, from reports and news to scanned filings and meeting recordings. Yet most existing evaluations of LLMs in finance remain text-only, monolingual, and largely saturated by current models. To bridge these gaps, we present MultiFinBen, the first expert-annotated multilingual (five languages) and multimodal (text, vision, audio) benchmark for evaluating LLMs in realistic financial contexts. MultiFinBen introduces two new task families: multilingual financial reasoning, which tests cross-lingual evidence integration from filings and news, and financial OCR, which extracts structured text from scanned documents containing tables and charts. Rather than aggregating all available datasets, we apply a structured, difficulty-aware selection based on advanced model performance, ensuring balanced challenge and removing redundant tasks. Evaluating 21 leading LLMs shows that even frontier multimodal models like GPT-4o achieve only 46.01% overall, stronger on vision and audio but dropping sharply in multilingual settings. These findings expose persistent limitations in multilingual, multimodal, and expert-level financial reasoning. All datasets, evaluation scripts, and leaderboards are publicly released.
FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning
Zhuohan Xie | Daniil Orel | Rushil Thareja | Dhruv Sahnan | Hachem Madmoun | Fan Zhang | Debopriyo Banerjee | Georgi Nenkov Georgiev | Xueqing Peng | Lingfei Qian | Jimin Huang | Jinyan Su | Aaryamonvikram Singh | Rui Xing | Rania Elbadry | Chen Xu | Haonan Li | Fajri Koto | Ivan Koychev | Tanmoy Chakraborty | Yuxia Wang | Salem Lahlou | Veselin Stoyanov | Sophia Ananiadou | Preslav Nakov
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhuohan Xie | Daniil Orel | Rushil Thareja | Dhruv Sahnan | Hachem Madmoun | Fan Zhang | Debopriyo Banerjee | Georgi Nenkov Georgiev | Xueqing Peng | Lingfei Qian | Jimin Huang | Jinyan Su | Aaryamonvikram Singh | Rui Xing | Rania Elbadry | Chen Xu | Haonan Li | Fajri Koto | Ivan Koychev | Tanmoy Chakraborty | Yuxia Wang | Salem Lahlou | Veselin Stoyanov | Sophia Ananiadou | Preslav Nakov
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multi-step symbolic reasoning is essential for robust financial analysis; yet, current benchmarks largely overlook this capability. Existing datasets such as FinQA and ConvFinQA emphasize final numerical answers while neglecting the intermediate reasoning steps required for transparency and verification. To address this gap, we introduce FinChain, the first benchmark specifically designed for verifiable Chain-of-Thought evaluation in finance. FinChain spans 58 topics across 12 financial domains, each represented by parameterized symbolic templates with executable Python code that enable fully machine-verifiable reasoning and scalable, contamination-free data generation.To assess reasoning capacity, we propose ChainEval, a dynamic alignment measure that jointly evaluates both the final-answer correctness and the step-level reasoning consistency. Our evaluation of 26 leading LLMs reveals that even frontier LLMs exhibit clear limitations in symbolic financial reasoning, while domain-adapted and math-enhanced fine-tuned models can substantially narrow this gap.Overall, FinChain exposes persistent weaknesses in multi-step financial reasoning and provides a foundation for developing trustworthy, interpretable, and verifiable financial AI. This project is available at https://github.com/mbzuai-nlp/finchain.git.
Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection
Zhiwei Liu | Yupeng Cao | Yuechen Jiang | Mohsinul Kabir | Polydoros Giannouris | Chen Xu | Ziyang Xu | Tianlei Zhu | Md. Tariquzzaman | Triantafillos Papadopoulos | Yan Wang | Lingfei Qian | Xueqing Peng | Zhuohan Xie | Ye Yuan | Saeed Almheiri | Abdulrazzaq Alnajjar | Ming-Bin Chen | Harry Stuart | Paul Thompson | Prayag Tiwari | Alejandro Lopez-Lira | Xue Liu | Jimin Huang | Sophia Ananiadou
Findings of the Association for Computational Linguistics: ACL 2026
Zhiwei Liu | Yupeng Cao | Yuechen Jiang | Mohsinul Kabir | Polydoros Giannouris | Chen Xu | Ziyang Xu | Tianlei Zhu | Md. Tariquzzaman | Triantafillos Papadopoulos | Yan Wang | Lingfei Qian | Xueqing Peng | Zhuohan Xie | Ye Yuan | Saeed Almheiri | Abdulrazzaq Alnajjar | Ming-Bin Chen | Harry Stuart | Paul Thompson | Prayag Tiwari | Alejandro Lopez-Lira | Xue Liu | Jimin Huang | Sophia Ananiadou
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) have been widely applied across various domains of finance. Since their training data are largely derived from human-authored corpora, LLMs may inherit a range of human biases. Behavioral biases can lead to instability and uncertainty in decision-making, particularly when processing financial information. However, existing research on LLM bias has mainly focused on direct questioning or simplified, general-purpose settings, with limited consideration of the complex real-world financial environments and high-risk, context-sensitive, multilingual financial misinformation detection tasks (MFMD). In this work, we propose MFMDScen, a comprehensive benchmark for evaluating behavioral biases of LLMs in MFMD across diverse economic scenarios. In collaboration with financial experts, we construct three types of complex financial scenarios: (i) role- and personality-based, (ii) role- and region-based, and (iii) role-based scenarios incorporating ethnicity and religious beliefs. We further develop a multilingual financial misinformation dataset covering English, Chinese, Greek, and Bengali. By integrating these scenarios with misinformation claims, MFMDScen enables a systematic evaluation of 22 mainstream LLMs. Our findings reveal that pronounced behavioral biases persist across both commercial and open-source models. This project is available at https://github.com/lzw108/FMD.
2025
Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance
Xueqing Peng | Triantafillos Papadopoulos | Efstathia Soufleri | Polydoros Giannouris | Ruoyu Xiang | Yan Wang | Lingfei Qian | Jimin Huang | Qianqian Xie | Sophia Ananiadou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Xueqing Peng | Triantafillos Papadopoulos | Efstathia Soufleri | Polydoros Giannouris | Ruoyu Xiang | Yan Wang | Lingfei Qian | Jimin Huang | Qianqian Xie | Sophia Ananiadou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Despite Greece’s pivotal role in the global economy, large language models (LLMs) remain underexplored for Greek financial context due to the linguistic complexity of Greek and the scarcity of domain-specific datasets. While multilingual financial NLP has revealed large performance gaps across languages, no benchmarks or LLMs have been tailored for Greek financial tasks until now. To bridge this gap, we introduce Plutus-ben, the first Greek Financial Evaluation Benchmark, and Plutus-8B, the first financial LLM fine-tuned on Greek-specific financial data. Plutus-ben addresses six core tasks: numeric/textual named entity recognition, question answering, extractive summarization, abstractive summarization, and topic classification. To support these tasks, we release four new expert-annotated Greek financial datasets and incorporate two existing resources. Our comprehensive evaluation of 24 LLMs reveals persistent challenges in Greek financial NLP, driven by linguistic complexity, domain terminology, and financial reasoning gaps. Experiment results underscore the limitations of cross-lingual transfer and the need for Greek-specific financial modeling. We publicly release Plutus-ben, Plutus-8B, and all associated datasets to promote reproducible research and advance multilingual financial NLP.
INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent
Haohang Li | Yupeng Cao | Yangyang Yu | Shashidhar Reddy Javaji | Zhiyang Deng | Yueru He | Yuechen Jiang | Zining Zhu | K.p. Subbalakshmi | Jimin Huang | Lingfei Qian | Xueqing Peng | Jordan W. Suchow | Qianqian Xie
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haohang Li | Yupeng Cao | Yangyang Yu | Shashidhar Reddy Javaji | Zhiyang Deng | Yueru He | Yuechen Jiang | Zining Zhu | K.p. Subbalakshmi | Jimin Huang | Lingfei Qian | Xueqing Peng | Jordan W. Suchow | Qianqian Xie
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent advancements have underscored the potential of large language model (LLM)-based agents in financial decision-making. Despite this progress, the field currently encounters two main challenges: (1) the lack of a comprehensive LLM agent framework adaptable to a variety of financial tasks, and (2) the absence of standardized benchmarks and consistent datasets for assessing agent performance. To tackle these issues, we introduce InvestorBench, the first benchmark specifically designed for evaluating LLM-based agents in diverse financial decision-making contexts. InvestorBench enhances the versatility of LLM-enabled agents by providing a comprehensive suite of tasks applicable to different financial products, including single equities like stocks and cryptocurrencies, and exchange-traded funds (ETFs). Additionally, we assess the reasoning and decision-making capabilities of our agent framework using thirteen different LLMs as backbone models, across various market environments and tasks. Furthermore, we have curated a diverse collection of open-source, datasets and developed a comprehensive suite of environments for financial decision-making. This establishes a highly accessible platform for evaluating financial agents’ performance across various scenarios.
Search
Fix author
Co-authors
- Jimin Huang 5
- Xueqing Peng 5
- Sophia Ananiadou 4
- Yupeng Cao 3
- Polydoros Giannouris 3
- Yuechen Jiang 3
- Triantafillos Papadopoulos 3
- Qianqian Xie 3
- Zhiyang Deng 2
- Yueru He 2
- Haohang Li 2
- Zhiwei Liu 2
- Alejandro Lopez-Lira 2
- Efstathia Soufleri 2
- Yan Wang 2
- Ruoyu Xiang 2
- Zhuohan Xie 2
- Chen Xu 2
- Yangyang Yu 2
- Saeed Almheiri 1
- Abdulrazzaq Alnajjar 1
- Debopriyo Banerjee 1
- Tanmoy Chakraborty 1
- Nuo Chen 1
- Xi Chen 1
- Ming-Bin Chen 1
- Arman Cohan 1
- Rania Elbadry 1
- Yun Feng 1
- Heming Fu 1
- Penglei Gao 1
- Georgi Nenkov Georgiev 1
- Yuqing Guo 1
- Yi Han 1
- Huan He 1
- Jerry Huang 1
- Shashidhar Reddy Javaji 1
- Mingyang Jiang 1
- Mohsinul Kabir 1
- Fajri Koto 1
- Ivan Koychev 1
- Salem Lahlou 1
- Haonan Li 1
- Shengyuan Lin 1
- Mingquan Lin 1
- Xiao-Yang Liu 1
- Xue Liu 1
- Peng Lu 1
- Hachem Madmoun 1
- Preslav Nakov 1
- Jian-Yun Nie 1
- Daniil Orel 1
- Meikang Qiu 1
- Yang Ren 1
- Dhruv Sahnan 1
- Aaryamonvikram Singh 1
- Kaleb E. Smith 1
- Veselin Stoyanov 1
- Harry Stuart 1
- Jinyan Su 1
- K.p. Subbalakshmi 1
- Jordan W. Suchow 1
- Md. Tariquzzaman 1
- Rushil Thareja 1
- Paul Thompson 1
- Prayag Tiwari 1
- Jun’ichi Tsujii 1
- Xiaoyu Wang 1
- Keyi Wang 1
- Suyuchen Wang 1
- Yuxia Wang 1
- Yan Wang 1
- Rui Xing 1
- Guojun Xiong 1
- Ziyang Xu 1
- Shanshan Yang 1
- Ye Yuan 1
- Vincent Jim Zhang 1
- Fan Zhang 1
- Jeff Zhao 1
- Yilun Zhao 1
- Yijia Zhao 1
- Tianlei Zhu 1
- Zining Zhu 1