Zhiyang Deng
2026
MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Real-world financial analysis involves information across multiple languages and modalities, from reports and news to scanned filings and meeting recordings. Yet most existing evaluations of LLMs in finance remain text-only, monolingual, and largely saturated by current models. To bridge these gaps, we present MultiFinBen, the first expert-annotated multilingual (five languages) and multimodal (text, vision, audio) benchmark for evaluating LLMs in realistic financial contexts. MultiFinBen introduces two new task families: multilingual financial reasoning, which tests cross-lingual evidence integration from filings and news, and financial OCR, which extracts structured text from scanned documents containing tables and charts. Rather than aggregating all available datasets, we apply a structured, difficulty-aware selection based on advanced model performance, ensuring balanced challenge and removing redundant tasks. Evaluating 21 leading LLMs shows that even frontier multimodal models like GPT-4o achieve only 46.01% overall, stronger on vision and audio but dropping sharply in multilingual settings. These findings expose persistent limitations in multilingual, multimodal, and expert-level financial reasoning. All datasets, evaluation scripts, and leaderboards are publicly released.
All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection
Yuechen Jiang | Zhiwei Liu | Yupeng Cao | Yueru He | Ziyang Xu | Chen Xu | Zhiyang Deng | Prayag Tiwari | Xi Chen | Alejandro Lopez-Lira | Jimin Huang | Junichi Tsujii | Sophia Ananiadou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuechen Jiang | Zhiwei Liu | Yupeng Cao | Yueru He | Ziyang Xu | Chen Xu | Zhiyang Deng | Prayag Tiwari | Xi Chen | Alejandro Lopez-Lira | Jimin Huang | Junichi Tsujii | Sophia Ananiadou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We introduce RFC-Bench, a benchmark for evaluating large language models on financial misinformation under realistic news. RFC-Bench operates at the paragraph level and captures the contextual complexity of financial news where meaning emerges from dispersed cues. The benchmark defines two complementary tasks: reference-free misinformation detection and comparison-based diagnosis using paired original–perturbed inputs. Experiments reveal a consistent pattern: performance is substantially stronger when comparative context is available, while reference-free settings expose significant weaknesses, including unstable predictions and elevated invalid outputs. These results indicate that current models struggle to maintain coherent belief states without external grounding. By highlighting this gap, RFC-Bench provides a structured testbed for studying reference-free reasoning and advancing more reliable financial misinformation detection in real-world settings.
2025
FinNLP-FNP-LLMFinLegal @ COLING 2025 Shared Task: Agent-Based Single Cryptocurrency Trading Challenge
Yangyang Yu | Haohang Li | Yupeng Cao | Keyi Wang | Zhiyang Deng | Zhiyuan Yao | Yuechen Jiang | Dong Li | Ruey-Ling Weng | Jordan W. Suchow
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)
Yangyang Yu | Haohang Li | Yupeng Cao | Keyi Wang | Zhiyang Deng | Zhiyuan Yao | Yuechen Jiang | Dong Li | Ruey-Ling Weng | Jordan W. Suchow
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)
Despite the promise of large language models based agent framework in stock trading task, their capabilities for comprehensive analysis and multiple different financial assets remain largely unexplored, such as cryptocurrency trading. To evaluate the capabilities of LLM-based agent framework in cryptocurrency trading, we introduce an LLMs-based financial shared task featured at COLING 2025 FinNLP-FNP-LLMFinLegal workshop, named Agent-based Single Cryptocurrency Trading Challenge. This challenge includes two cryptocurrencies: BitCoin and Ethereum. In this paper, we provide an overview of these tasks and datasets, summarize participants’ methods, and present their experimental evaluations, highlighting the effectiveness of LLMs in addressing cryptocurrency trading challenges. To the best of our knowledge, the Agent-based Single Cryptocurrency Trading Challenge is one of the first challenges for assessing LLMs in the financial area. In consequence, we provide detailed observations and take away conclusions for future development in this area.
INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent
Haohang Li | Yupeng Cao | Yangyang Yu | Shashidhar Reddy Javaji | Zhiyang Deng | Yueru He | Yuechen Jiang | Zining Zhu | K.p. Subbalakshmi | Jimin Huang | Lingfei Qian | Xueqing Peng | Jordan W. Suchow | Qianqian Xie
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haohang Li | Yupeng Cao | Yangyang Yu | Shashidhar Reddy Javaji | Zhiyang Deng | Yueru He | Yuechen Jiang | Zining Zhu | K.p. Subbalakshmi | Jimin Huang | Lingfei Qian | Xueqing Peng | Jordan W. Suchow | Qianqian Xie
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent advancements have underscored the potential of large language model (LLM)-based agents in financial decision-making. Despite this progress, the field currently encounters two main challenges: (1) the lack of a comprehensive LLM agent framework adaptable to a variety of financial tasks, and (2) the absence of standardized benchmarks and consistent datasets for assessing agent performance. To tackle these issues, we introduce InvestorBench, the first benchmark specifically designed for evaluating LLM-based agents in diverse financial decision-making contexts. InvestorBench enhances the versatility of LLM-enabled agents by providing a comprehensive suite of tasks applicable to different financial products, including single equities like stocks and cryptocurrencies, and exchange-traded funds (ETFs). Additionally, we assess the reasoning and decision-making capabilities of our agent framework using thirteen different LLMs as backbone models, across various market environments and tasks. Furthermore, we have curated a diverse collection of open-source, datasets and developed a comprehensive suite of environments for financial decision-making. This establishes a highly accessible platform for evaluating financial agents’ performance across various scenarios.
FLAG-TRADER: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading
Guojun Xiong | Zhiyang Deng | Keyi Wang | Yupeng Cao | Haohang Li | Yangyang Yu | Xueqing Peng | Mingquan Lin | Kaleb E Smith | Xiao-Yang Liu | Jimin Huang | Sophia Ananiadou | Qianqian Xie
Findings of the Association for Computational Linguistics: ACL 2025
Guojun Xiong | Zhiyang Deng | Keyi Wang | Yupeng Cao | Haohang Li | Yangyang Yu | Xueqing Peng | Mingquan Lin | Kaleb E Smith | Xiao-Yang Liu | Jimin Huang | Sophia Ananiadou | Qianqian Xie
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) fine-tuned on multimodal financial data have demonstrated impressive reasoning capabilities in various financial tasks. However, they often struggle with multi-step, goal-oriented scenarios in interactive financial markets, such as trading, where complex agentic approaches are required to improve decision-making. To address this, we propose FLAG-Trader, a unified architecture integrating linguistic processing (via LLMs) with gradient-driven reinforcement learning (RL) policy optimization, in which a partially fine-tuned LLM acts as the policy network, leveraging pre-trained knowledge while adapting to the financial domain through parameter-efficient fine-tuning. Through policy gradient optimization driven by trading rewards, our framework not only enhances LLM performance in trading but also improves results on other financial-domain tasks. We present extensive empirical evidence to validate these enhancements.
2024
Search
Fix author
Co-authors
- Yupeng Cao 6
- Jimin Huang 4
- Yuechen Jiang 4
- Haohang Li 4
- Yangyang Yu 4
- Sophia Ananiadou 3
- Yueru He 3
- Xueqing Peng 3
- Qianqian Xie 3
- Xi Chen 2
- Mingquan Lin 2
- Zhiwei Liu 2
- Xiao-Yang Liu 2
- Alejandro Lopez-Lira 2
- Lingfei Qian 2
- Kaleb E. Smith 2
- Jordan W. Suchow 2
- Jun’ichi Tsujii 2
- Keyi Wang 2
- Guojun Xiong 2
- Zhiyuan Yao 2
- Nuo Chen 1
- Zhi Chen 1
- Arman Cohan 1
- Yun Feng 1
- Heming Fu 1
- Penglei Gao 1
- Polydoros Giannouris 1
- Yuqing Guo 1
- Yi Han 1
- Huan He 1
- Jerry Huang 1
- Shashidhar Reddy Javaji 1
- Mingyang Jiang 1
- Dong Li 1
- Shengyuan Lin 1
- Peng Lu 1
- Jian-Yun Nie 1
- Triantafillos Papadopoulos 1
- Meikang Qiu 1
- Yang Ren 1
- Efstathia Soufleri 1
- K.p. Subbalakshmi 1
- Prayag Tiwari 1
- Yan Wang 1
- Xiaoyu Wang 1
- Suyuchen Wang 1
- Keyi Wang 1
- Ruey-Ling Weng 1
- Ruoyu Xiang 1
- Ziyang Xu 1
- Chen Xu 1
- Shanshan Yang 1
- Vincent Jim Zhang 1
- Jeff Zhao 1
- Yilun Zhao 1
- Yijia Zhao 1
- Zining Zhu 1