Xueqing Peng


2025

pdf bib
INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent
Haohang Li | Yupeng Cao | Yangyang Yu | Shashidhar Reddy Javaji | Zhiyang Deng | Yueru He | Yuechen Jiang | Zining Zhu | K.p. Subbalakshmi | Jimin Huang | Lingfei Qian | Xueqing Peng | Jordan W. Suchow | Qianqian Xie
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advancements have underscored the potential of large language model (LLM)-based agents in financial decision-making. Despite this progress, the field currently encounters two main challenges: (1) the lack of a comprehensive LLM agent framework adaptable to a variety of financial tasks, and (2) the absence of standardized benchmarks and consistent datasets for assessing agent performance. To tackle these issues, we introduce InvestorBench, the first benchmark specifically designed for evaluating LLM-based agents in diverse financial decision-making contexts. InvestorBench enhances the versatility of LLM-enabled agents by providing a comprehensive suite of tasks applicable to different financial products, including single equities like stocks and cryptocurrencies, and exchange-traded funds (ETFs). Additionally, we assess the reasoning and decision-making capabilities of our agent framework using thirteen different LLMs as backbone models, across various market environments and tasks. Furthermore, we have curated a diverse collection of open-source, datasets and developed a comprehensive suite of environments for financial decision-making. This establishes a highly accessible platform for evaluating financial agents’ performance across various scenarios.

pdf bib
Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance
Xueqing Peng | Triantafillos Papadopoulos | Efstathia Soufleri | Polydoros Giannouris | Ruoyu Xiang | Yan Wang | Lingfei Qian | Jimin Huang | Qianqian Xie | Sophia Ananiadou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Despite Greece’s pivotal role in the global economy, large language models (LLMs) remain underexplored for Greek financial context due to the linguistic complexity of Greek and the scarcity of domain-specific datasets. While multilingual financial NLP has revealed large performance gaps across languages, no benchmarks or LLMs have been tailored for Greek financial tasks until now. To bridge this gap, we introduce Plutus-ben, the first Greek Financial Evaluation Benchmark, and Plutus-8B, the first financial LLM fine-tuned on Greek-specific financial data. Plutus-ben addresses six core tasks: numeric/textual named entity recognition, question answering, extractive summarization, abstractive summarization, and topic classification. To support these tasks, we release four new expert-annotated Greek financial datasets and incorporate two existing resources. Our comprehensive evaluation of 24 LLMs reveals persistent challenges in Greek financial NLP, driven by linguistic complexity, domain terminology, and financial reasoning gaps. Experiment results underscore the limitations of cross-lingual transfer and the need for Greek-specific financial modeling. We publicly release Plutus-ben, Plutus-8B, and all associated datasets to promote reproducible research and advance multilingual financial NLP.

pdf bib
FLAG-TRADER: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading
Guojun Xiong | Zhiyang Deng | Keyi Wang | Yupeng Cao | Haohang Li | Yangyang Yu | Xueqing Peng | Mingquan Lin | Kaleb E Smith | Xiao-Yang Liu | Jimin Huang | Sophia Ananiadou | Qianqian Xie
Findings of the Association for Computational Linguistics: ACL 2025

Large language models (LLMs) fine-tuned on multimodal financial data have demonstrated impressive reasoning capabilities in various financial tasks. However, they often struggle with multi-step, goal-oriented scenarios in interactive financial markets, such as trading, where complex agentic approaches are required to improve decision-making. To address this, we propose FLAG-Trader, a unified architecture integrating linguistic processing (via LLMs) with gradient-driven reinforcement learning (RL) policy optimization, in which a partially fine-tuned LLM acts as the policy network, leveraging pre-trained knowledge while adapting to the financial domain through parameter-efficient fine-tuning. Through policy gradient optimization driven by trading rewards, our framework not only enhances LLM performance in trading but also improves results on other financial-domain tasks. We present extensive empirical evidence to validate these enhancements.