Alejandro Lopez-Lira
2026
MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Real-world financial analysis involves information across multiple languages and modalities, from reports and news to scanned filings and meeting recordings. Yet most existing evaluations of LLMs in finance remain text-only, monolingual, and largely saturated by current models. To bridge these gaps, we present MultiFinBen, the first expert-annotated multilingual (five languages) and multimodal (text, vision, audio) benchmark for evaluating LLMs in realistic financial contexts. MultiFinBen introduces two new task families: multilingual financial reasoning, which tests cross-lingual evidence integration from filings and news, and financial OCR, which extracts structured text from scanned documents containing tables and charts. Rather than aggregating all available datasets, we apply a structured, difficulty-aware selection based on advanced model performance, ensuring balanced challenge and removing redundant tasks. Evaluating 21 leading LLMs shows that even frontier multimodal models like GPT-4o achieve only 46.01% overall, stronger on vision and audio but dropping sharply in multilingual settings. These findings expose persistent limitations in multilingual, multimodal, and expert-level financial reasoning. All datasets, evaluation scripts, and leaderboards are publicly released.
All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection
Yuechen Jiang | Zhiwei Liu | Yupeng Cao | Yueru He | Ziyang Xu | Chen Xu | Zhiyang Deng | Prayag Tiwari | Xi Chen | Alejandro Lopez-Lira | Jimin Huang | Junichi Tsujii | Sophia Ananiadou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuechen Jiang | Zhiwei Liu | Yupeng Cao | Yueru He | Ziyang Xu | Chen Xu | Zhiyang Deng | Prayag Tiwari | Xi Chen | Alejandro Lopez-Lira | Jimin Huang | Junichi Tsujii | Sophia Ananiadou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We introduce RFC-Bench, a benchmark for evaluating large language models on financial misinformation under realistic news. RFC-Bench operates at the paragraph level and captures the contextual complexity of financial news where meaning emerges from dispersed cues. The benchmark defines two complementary tasks: reference-free misinformation detection and comparison-based diagnosis using paired original–perturbed inputs. Experiments reveal a consistent pattern: performance is substantially stronger when comparative context is available, while reference-free settings expose significant weaknesses, including unstable predictions and elevated invalid outputs. These results indicate that current models struggle to maintain coherent belief states without external grounding. By highlighting this gap, RFC-Bench provides a structured testbed for studying reference-free reasoning and advancing more reliable financial misinformation detection in real-world settings.
LLM as a Risk Manager: LLM Semantic Filtering for Lead–Lag Trading in Prediction Markets
Sumin Kim | Minjae Kim | Jihoon Kwon | Yoon Kim | Oscar Levy | Alejandro Lopez-Lira | Yongjae Lee | Chanyeol Choi
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Sumin Kim | Minjae Kim | Jihoon Kwon | Yoon Kim | Oscar Levy | Alejandro Lopez-Lira | Yongjae Lee | Chanyeol Choi
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Prediction markets provide a unique setting where event-level time series are directly tied to natural-language descriptions, yet discovering robust lead–lag relationships remains challenging due to spurious statistical correlations. We propose a hybrid two-stage causal screener to address this challenge: (i) a statistical stage that uses Granger causality to identify candidate leader–follower pairs from market-implied probability time series, and (ii) an LLM-based semantic stage that re-ranks these candidates by assessing whether the proposed direction admits a plausible economic transmission mechanism based on event descriptions. Because causal ground truth is unobserved, we evaluate the ranked pairs using a fixed, signal-triggered trading protocol that maps relationship quality into realized profit and loss (PnL).On Kalshi Economics markets, our hybrid approach consistently outperforms the statistical baseline. Across rolling evaluations, the win rate increases from 51.4% to 54.5%. Crucially, the average magnitude of losing trades decreases substantially from 649 USD to 347 USD. This reduction is driven by the LLM’s ability to filter out statistically fragile links that are prone to large losses, rather than relying on rare gains. These improvements remain stable across different trading configurations, indicating that the gains are not driven by specific parameter choices. Overall, the results suggest that LLMs function as semantic risk managers on top of statistical discovery, prioritizing lead–lag relationships that generalize under changing market conditions.
2024
FinNLP-AgentScen-2024 Shared Task: Financial Challenges in Large Language Models - FinLLMs
Qianqian Xie | Jimin Huang | Dong Li | Zhengyu Chen | Ruoyu Xiang | Mengxi Xiao | Yangyang Yu | Vijayasai Somasundaram | Kailai Yang | Chenhan Yuan | Zheheng Luo | Zhiwei Liu | Yueru He | Yuechen Jiang | Haohang Li | Duanyu Feng | Xiao-Yang Liu | Benyou Wang | Hao Wang | Yanzhao Lai | Jordan Suchow | Alejandro Lopez-Lira | Min Peng | Sophia Ananiadou
Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning
Qianqian Xie | Jimin Huang | Dong Li | Zhengyu Chen | Ruoyu Xiang | Mengxi Xiao | Yangyang Yu | Vijayasai Somasundaram | Kailai Yang | Chenhan Yuan | Zheheng Luo | Zhiwei Liu | Yueru He | Yuechen Jiang | Haohang Li | Duanyu Feng | Xiao-Yang Liu | Benyou Wang | Hao Wang | Yanzhao Lai | Jordan Suchow | Alejandro Lopez-Lira | Min Peng | Sophia Ananiadou
Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning
Search
Fix author
Co-authors
- Sophia Ananiadou 3
- Yueru He 3
- Jimin Huang 3
- Yuechen Jiang 3
- Yupeng Cao 2
- Xi Chen 2
- Zhiyang Deng 2
- Haohang Li 2
- Xiao-Yang Liu 2
- Zhiwei Liu 2
- Jun’ichi Tsujii 2
- Ruoyu Xiang 2
- Qianqian Xie 2
- Yangyang Yu 2
- Nuo Chen 1
- Zhengyu Chen 1
- Chanyeol Choi 1
- Arman Cohan 1
- Duanyu Feng 1
- Yun Feng 1
- Heming Fu 1
- Penglei Gao 1
- Polydoros Giannouris 1
- Yuqing Guo 1
- Yi Han 1
- Huan He 1
- Jerry Huang 1
- Mingyang Jiang 1
- Minjae Kim 1
- Sumin Kim 1
- Yoon Kim 1
- Jihoon Kwon 1
- Yanzhao Lai 1
- Yongjae Lee 1
- Oscar Levy 1
- Dong Li 1
- Mingquan Lin 1
- Shengyuan Lin 1
- Zhiwei Liu 1
- Peng Lu 1
- Zheheng Luo 1
- Jian-Yun Nie 1
- Triantafillos Papadopoulos 1
- Min Peng 1
- Xueqing Peng 1
- Lingfei Qian 1
- Meikang Qiu 1
- Yang Ren 1
- Kaleb E. Smith 1
- Vijayasai Somasundaram 1
- Efstathia Soufleri 1
- Jordan Suchow 1
- Prayag Tiwari 1
- Benyou Wang 1
- Hao Wang 1
- Keyi Wang 1
- Suyuchen Wang 1
- Xiaoyu Wang 1
- Yan Wang 1
- Mengxi Xiao 1
- Guojun Xiong 1
- Chen Xu 1
- Ziyang Xu 1
- Kailai Yang 1
- Shanshan Yang 1
- Chenhan Yuan 1
- Vincent Jim Zhang 1
- Jeff Zhao 1
- Yijia Zhao 1
- Yilun Zhao 1