Yi Han
Papers on this page may belong to the following people: Yi Han, Yi Han
2026
MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Real-world financial analysis involves information across multiple languages and modalities, from reports and news to scanned filings and meeting recordings. Yet most existing evaluations of LLMs in finance remain text-only, monolingual, and largely saturated by current models. To bridge these gaps, we present MultiFinBen, the first expert-annotated multilingual (five languages) and multimodal (text, vision, audio) benchmark for evaluating LLMs in realistic financial contexts. MultiFinBen introduces two new task families: multilingual financial reasoning, which tests cross-lingual evidence integration from filings and news, and financial OCR, which extracts structured text from scanned documents containing tables and charts. Rather than aggregating all available datasets, we apply a structured, difficulty-aware selection based on advanced model performance, ensuring balanced challenge and removing redundant tasks. Evaluating 21 leading LLMs shows that even frontier multimodal models like GPT-4o achieve only 46.01% overall, stronger on vision and audio but dropping sharply in multilingual settings. These findings expose persistent limitations in multilingual, multimodal, and expert-level financial reasoning. All datasets, evaluation scripts, and leaderboards are publicly released.
2025
Nullspace Disentanglement for Red Teaming Language Models
Yi Han | Yuanxing Liu | Weinan Zhang | Ting Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yi Han | Yuanxing Liu | Weinan Zhang | Ting Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
With the widespread deployment of generative language models, concerns about safety issues have continuously grown. High-quality fine-tuning data generated from red teaming plays a crucial role in the model’s safety. Recently, automated red teaming approaches have been proposed to create test cases. However, these approaches, which rely on open-ended generation, encounter issues related to inefficiency and low attack success rates. In this work, we introduce a black-box approach that ingeniously exploits the unique properties of the nullspace to disentangle and regulate the crucial success information within test cases. Our study provides a brand-new perspective for automated red team research. Experimental results demonstrate that our approach outperforms baseline methods regarding the attack success rate. The generated test cases also excel in aspects of diversity and fluency.
2024
Definition Generation for Automatically Induced Semantic Frame
Yi Han | Ryohei Sasano | Koichi Takeda
Findings of the Association for Computational Linguistics: ACL 2024
Yi Han | Ryohei Sasano | Koichi Takeda
Findings of the Association for Computational Linguistics: ACL 2024
In a semantic frame resource such as FrameNet, the definition sentence of a frame is essential for humans to understand the meaning of the frame intuitively. Recently, several attempts have been made to induce semantic frames from large corpora, but the cost of creating the definition sentences for such frames is significant. In this paper, we address a new task of generating frame definitions from a set of frame-evoking words. Specifically, given a cluster of frame-evoking words and associated exemplars induced as the same semantic frame, we utilize a large language model to generate frame definitions. We demonstrate that incorporating frame element reasoning as chain-of-thought can enhance the inclusion of correct frame elements in the generated definitions.
Shoes-ACOSI: A Dataset for Aspect-Based Sentiment Analysis with Implicit Opinion Extraction
Joseph J Peper | Wenzhao Qiu | Ryan Bruggeman | Yi Han | Estefania Ciliotta Chehade | Lu Wang
Findings of the Association for Computational Linguistics: EMNLP 2024
Joseph J Peper | Wenzhao Qiu | Ryan Bruggeman | Yi Han | Estefania Ciliotta Chehade | Lu Wang
Findings of the Association for Computational Linguistics: EMNLP 2024
We explore *implicit opinion extraction* as a new component of aspect-based sentiment analysis (ABSA) systems. Prior work in ABSA has investigated opinion extraction as an important subtask, however, these works only label concise, *explicitly*-stated opinion spans. In this work, we present **Shoes-ACOSI**, a new and challenging ABSA dataset in the e-commerce domain with implicit opinion span annotations, the first of its kind. Shoes-ACOSI builds upon the existing Aspect-Category-Opinion-Sentiment (ACOS) quadruple extraction task, extending the task to quintuple extraction—now localizing and differentiating both implicit and explicit opinion. In addition to the new annotation schema, our dataset contains paragraph-length inputs which, importantly, present complex challenges through increased input length, increased number of sentiment expressions, and more mixed-sentiment-polarity examples when compared with existing benchmarks. We quantify the difficulty of our new dataset by evaluating with state-of-the-art fully-supervised and prompted-LLM baselines. We find our dataset presents significant challenges for both supervised models and LLMs, particularly from the new implicit opinion extraction component of the ACOSI task, highlighting the need for continued research into implicit opinion understanding.
2023
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models
Ziyu Zhuang | Qiguang Chen | Longxuan Ma | Mingda Li | Yi Han | Yushan Qian | Haopeng Bai | Weinan Zhang | Ting Liu
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum)
Ziyu Zhuang | Qiguang Chen | Longxuan Ma | Mingda Li | Yi Han | Yushan Qian | Haopeng Bai | Weinan Zhang | Ting Liu
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum)
“From pre-trained language model (PLM) to large language model (LLM), the field of naturallanguage processing (NLP) has witnessed steep performance gains and wide practical uses. Theevaluation of a research field guides its direction of improvement. However, LLMs are extremelyhard to thoroughly evaluate for two reasons. First of all, traditional NLP tasks become inade-quate due to the excellent performance of LLM. Secondly, existing evaluation tasks are difficultto keep up with the wide range of applications in real-world scenarios. To tackle these problems,existing works proposed various benchmarks to better evaluate LLMs. To clarify the numerousevaluation tasks in both academia and industry, we investigate multiple papers concerning LLMevaluations. We summarize 4 core competencies of LLM, including reasoning, knowledge, relia-bility, and safety. For every competency, we introduce its definition, corresponding benchmarks,and metrics. Under this competency architecture, similar tasks are combined to reflect corre-sponding ability, while new tasks can also be easily added into the system. Finally, we give oursuggestions on the future direction of LLM’s evaluation.”
2022
Automating Interlingual Homograph Recognition with Parallel Sentences
Yi Han | Ryohei Sasano | Koichi Takeda
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022
Yi Han | Ryohei Sasano | Koichi Takeda
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022
Interlingual homographs are words that spell the same but possess different meanings across languages. Recognizing interlingual homographs from form-identical words generally needs linguistic knowledge and massive annotation work. In this paper, we propose an automatic interlingual homograph recognition method based on the cross-lingual word embedding similarity and co-occurrence of form-identical words in parallel sentences. We conduct experiments with various off-the-shelf language models coordinating with cross-lingual alignment operations and co-occurrence metrics on the Chinese-Japanese and English-Dutch language pairs. Experimental results demonstrate that our proposed method is able to make accurate and consistent predictions across languages.
Search
Fix author
Co-authors
- Ting Liu 2
- Ryohei Sasano 2
- Koichi Takeda 2
- Sophia Ananiadou 1
- Haopeng Bai 1
- Ryan Bruggeman 1
- Yupeng Cao 1
- Estefania Ciliotta Chehade 1
- Qiguang Chen (陈麒光) 1
- Nuo Chen 1
- Xi Chen 1
- Arman Cohan 1
- Zhiyang Deng 1
- Yun Feng 1
- Heming Fu 1
- Penglei Gao 1
- Polydoros Giannouris 1
- Yuqing Guo 1
- Yueru He 1
- Huan He 1
- Jerry Huang 1
- Jimin Huang 1
- Mingyang Jiang 1
- Yuechen Jiang 1
- Mingda Li 1
- Haohang Li 1
- Shengyuan Lin 1
- Mingquan Lin 1
- Zhiwei Liu 1
- Xiao-Yang Liu 1
- Yuanxing Liu 1
- Alejandro Lopez-Lira 1
- Peng Lu 1
- Longxuan Ma 1
- Jian-Yun Nie 1
- Triantafillos Papadopoulos 1
- Xueqing Peng 1
- Joseph J. Peper 1
- Yushan Qian 1
- Lingfei Qian 1
- Meikang Qiu 1
- Wenzhao Qiu 1
- Yang Ren 1
- Kaleb E. Smith 1
- Efstathia Soufleri 1
- Jun’ichi Tsujii 1
- Yan Wang 1
- Xiaoyu Wang 1
- Keyi Wang 1
- Suyuchen Wang 1
- Lu Wang 1
- Ruoyu Xiang 1
- Qianqian Xie 1
- Guojun Xiong 1
- Shanshan Yang 1
- Yangyang Yu 1
- Weinan Zhang 1
- Vincent Jim Zhang 1
- Weinan Zhang 1
- Jeff Zhao 1
- Yilun Zhao 1
- Yijia Zhao 1
- Ziyu Zhuang 1