Polydoros Giannouris
2026
MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Real-world financial analysis involves information across multiple languages and modalities, from reports and news to scanned filings and meeting recordings. Yet most existing evaluations of LLMs in finance remain text-only, monolingual, and largely saturated by current models. To bridge these gaps, we present MultiFinBen, the first expert-annotated multilingual (five languages) and multimodal (text, vision, audio) benchmark for evaluating LLMs in realistic financial contexts. MultiFinBen introduces two new task families: multilingual financial reasoning, which tests cross-lingual evidence integration from filings and news, and financial OCR, which extracts structured text from scanned documents containing tables and charts. Rather than aggregating all available datasets, we apply a structured, difficulty-aware selection based on advanced model performance, ensuring balanced challenge and removing redundant tasks. Evaluating 21 leading LLMs shows that even frontier multimodal models like GPT-4o achieve only 46.01% overall, stronger on vision and audio but dropping sharply in multilingual settings. These findings expose persistent limitations in multilingual, multimodal, and expert-level financial reasoning. All datasets, evaluation scripts, and leaderboards are publicly released.
Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection
Zhiwei Liu | Yupeng Cao | Yuechen Jiang | Mohsinul Kabir | Polydoros Giannouris | Chen Xu | Ziyang Xu | Tianlei Zhu | Md. Tariquzzaman | Triantafillos Papadopoulos | Yan Wang | Lingfei Qian | Xueqing Peng | Zhuohan Xie | Ye Yuan | Saeed Almheiri | Abdulrazzaq Alnajjar | Ming-Bin Chen | Harry Stuart | Paul Thompson | Prayag Tiwari | Alejandro Lopez-Lira | Xue Liu | Jimin Huang | Sophia Ananiadou
Findings of the Association for Computational Linguistics: ACL 2026
Zhiwei Liu | Yupeng Cao | Yuechen Jiang | Mohsinul Kabir | Polydoros Giannouris | Chen Xu | Ziyang Xu | Tianlei Zhu | Md. Tariquzzaman | Triantafillos Papadopoulos | Yan Wang | Lingfei Qian | Xueqing Peng | Zhuohan Xie | Ye Yuan | Saeed Almheiri | Abdulrazzaq Alnajjar | Ming-Bin Chen | Harry Stuart | Paul Thompson | Prayag Tiwari | Alejandro Lopez-Lira | Xue Liu | Jimin Huang | Sophia Ananiadou
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) have been widely applied across various domains of finance. Since their training data are largely derived from human-authored corpora, LLMs may inherit a range of human biases. Behavioral biases can lead to instability and uncertainty in decision-making, particularly when processing financial information. However, existing research on LLM bias has mainly focused on direct questioning or simplified, general-purpose settings, with limited consideration of the complex real-world financial environments and high-risk, context-sensitive, multilingual financial misinformation detection tasks (MFMD). In this work, we propose MFMDScen, a comprehensive benchmark for evaluating behavioral biases of LLMs in MFMD across diverse economic scenarios. In collaboration with financial experts, we construct three types of complex financial scenarios: (i) role- and personality-based, (ii) role- and region-based, and (iii) role-based scenarios incorporating ethnicity and religious beliefs. We further develop a multilingual financial misinformation dataset covering English, Chinese, Greek, and Bengali. By integrating these scenarios with misinformation claims, MFMDScen enables a systematic evaluation of 22 mainstream LLMs. Our findings reveal that pronounced behavioral biases persist across both commercial and open-source models. This project is available at https://github.com/lzw108/FMD.
2025
Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance
Xueqing Peng | Triantafillos Papadopoulos | Efstathia Soufleri | Polydoros Giannouris | Ruoyu Xiang | Yan Wang | Lingfei Qian | Jimin Huang | Qianqian Xie | Sophia Ananiadou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Xueqing Peng | Triantafillos Papadopoulos | Efstathia Soufleri | Polydoros Giannouris | Ruoyu Xiang | Yan Wang | Lingfei Qian | Jimin Huang | Qianqian Xie | Sophia Ananiadou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Despite Greece’s pivotal role in the global economy, large language models (LLMs) remain underexplored for Greek financial context due to the linguistic complexity of Greek and the scarcity of domain-specific datasets. While multilingual financial NLP has revealed large performance gaps across languages, no benchmarks or LLMs have been tailored for Greek financial tasks until now. To bridge this gap, we introduce Plutus-ben, the first Greek Financial Evaluation Benchmark, and Plutus-8B, the first financial LLM fine-tuned on Greek-specific financial data. Plutus-ben addresses six core tasks: numeric/textual named entity recognition, question answering, extractive summarization, abstractive summarization, and topic classification. To support these tasks, we release four new expert-annotated Greek financial datasets and incorporate two existing resources. Our comprehensive evaluation of 24 LLMs reveals persistent challenges in Greek financial NLP, driven by linguistic complexity, domain terminology, and financial reasoning gaps. Experiment results underscore the limitations of cross-lingual transfer and the need for Greek-specific financial modeling. We publicly release Plutus-ben, Plutus-8B, and all associated datasets to promote reproducible research and advance multilingual financial NLP.
FinNLP-FNP-LLMFinLegal-2025 Shared Task: Financial Misinformation Detection Challenge Task
Zhiwei Liu | Keyi Wang | Zhuo Bao | Xin Zhang | Jiping Dong | Kailai Yang | Mohsinul Kabir | Polydoros Giannouris | Rui Xing | Seongchan Park | Jaehong Kim | Dong Li | Qianqian Xie | Sophia Ananiadou
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)
Zhiwei Liu | Keyi Wang | Zhuo Bao | Xin Zhang | Jiping Dong | Kailai Yang | Mohsinul Kabir | Polydoros Giannouris | Rui Xing | Seongchan Park | Jaehong Kim | Dong Li | Qianqian Xie | Sophia Ananiadou
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)
Despite the promise of large language models (LLMs) in finance, their capabilities for financial misinformation detection (FMD) remain largely unexplored. To evaluate the capabilities of LLMs in FMD task, we introduce the financial misinformation detection shared task featured at COLING FinNLP-FNP-LLMFinLegal-2024, FMD Challenge. This challenge aims to evaluate the ability of LLMs to verify financial misinformation while generating plausible explanations. In this paper, we provide an overview of this task and dataset, summarize participants’ methods, and present their experimental evaluations, highlighting the effectiveness of LLMs in addressing the FMD task. To the best of our knowledge, the FMD Challenge is one of the first challenges for assessing LLMs in the field of FMD. Therefore, we provide detailed observations and draw conclusions for the future development of this field.
2024
Plain Language Summarization of Clinical Trials
Polydoros Giannouris | Theodoros Myridis | Tatiana Passali | Grigorios Tsoumakas
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024
Polydoros Giannouris | Theodoros Myridis | Tatiana Passali | Grigorios Tsoumakas
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024
Plain language summarization, or lay summarization, is an emerging natural language processing task, aiming to make scientific articles accessible to an audience of non-scientific backgrounds. The healthcare domain can greatly benefit from applications of automatic plain language summarization, as results that concern a large portion of the population are reported in large documents with complex terminology. However, existing corpora for this task are limited in scope, usually regarding conference or journal article abstracts. In this paper, we introduce the task of automated generation of plain language summaries for clinical trials, and construct CARES (Clinical Abstractive Result Extraction and Simplification), the first corresponding dataset. CARES consists of publicly available, human-written summaries of clinical trials conducted by Pfizer. Source text is identified from documents released throughout the life-cycle of the trial, and steps are taken to remove noise and select the appropriate sections. Experiments show that state-of-the-art models achieve satisfactory results in most evaluation metrics
Search
Fix author
Co-authors
- Sophia Ananiadou 4
- Jimin Huang 3
- Triantafillos Papadopoulos 3
- Xueqing Peng 3
- Lingfei Qian 3
- Qianqian Xie 3
- Yupeng Cao 2
- Yuechen Jiang 2
- Mohsinul Kabir 2
- Zhiwei Liu 2
- Alejandro Lopez-Lira 2
- Efstathia Soufleri 2
- Yan Wang 2
- Ruoyu Xiang 2
- Saeed Almheiri 1
- Abdulrazzaq Alnajjar 1
- Zhuo Bao 1
- Nuo Chen 1
- Xi Chen 1
- Ming-Bin Chen 1
- Arman Cohan 1
- Zhiyang Deng 1
- Jiping Dong 1
- Yun Feng 1
- Heming Fu 1
- Penglei Gao 1
- Yuqing Guo 1
- Yi Han 1
- Yueru He 1
- Huan He 1
- Jerry Huang 1
- Mingyang Jiang 1
- Jaehong Kim 1
- Haohang Li 1
- Dong Li 1
- Shengyuan Lin 1
- Mingquan Lin 1
- Xiao-Yang Liu 1
- Xue Liu 1
- Zhiwei Liu 1
- Peng Lu 1
- Theodoros Myridis 1
- Jian-Yun Nie 1
- Seongchan Park 1
- Tatiana Passali 1
- Meikang Qiu 1
- Yang Ren 1
- Kaleb E. Smith 1
- Harry Stuart 1
- Md. Tariquzzaman 1
- Paul Thompson 1
- Prayag Tiwari 1
- Grigorios Tsoumakas 1
- Jun’ichi Tsujii 1
- Xiaoyu Wang 1
- Keyi Wang 1
- Suyuchen Wang 1
- Yan Wang 1
- Keyi Wang 1
- Zhuohan Xie 1
- Rui Xing 1
- Guojun Xiong 1
- Chen Xu 1
- Ziyang Xu 1
- Shanshan Yang 1
- Kailai Yang 1
- Yangyang Yu 1
- Ye Yuan 1
- Vincent Jim Zhang 1
- Xin Zhang 1
- Jeff Zhao 1
- Yilun Zhao 1
- Yijia Zhao 1
- Tianlei Zhu 1