Junxi Liu
2025
Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models
Xiaojun Wu
|
Junxi Liu
|
Huan-Yi Su
|
Zhouchi Lin
|
Yiyan Qi
|
Chengjin Xu
|
Jiajun Su
|
Jiajie Zhong
|
Fuwei Wang
|
Saizhuo Wang
|
Fengrui Hua
|
Jia Li
|
Jian Guo
Findings of the Association for Computational Linguistics: EMNLP 2025
As large language models (LLMs) increasingly permeate the financial sector, there is a pressing need for a standardized method to comprehensively assess their performance. Existing financial benchmarks often suffer from limited language and task coverage, low-quality datasets, and inadequate adaptability for LLM evaluation. To address these limitations, we introduce Golden Touchstone, a comprehensive bilingual benchmark for financial LLMs, encompassing eight core financial NLP tasks in both Chinese and English. Developed from extensive open-source data collection and industry-specific demands, this benchmark thoroughly assesses models’ language understanding and generation capabilities. Through comparative analysis of major models such as GPT-4o, Llama3, FinGPT, and FinMA, we reveal their strengths and limitations in processing complex financial information. Additionally, we open-source Touchstone-GPT, a financial LLM trained through continual pre-training and instruction tuning, which demonstrates strong performance on the bilingual benchmark but still has limitations in specific tasks. This research provides a practical evaluation tool for financial LLMs and guides future development and optimization.The source code for Golden Touchstone and model weight of Touchstone-GPT have been made publicly available at https://github.com/IDEA-FinAI/Golden-Touchstone.
2022
A Novel Framework Based on Medical Concept Driven Attention for Explainable Medical Code Prediction via External Knowledge
Tao Wang
|
Linhai Zhang
|
Chenchen Ye
|
Junxi Liu
|
Deyu Zhou
Findings of the Association for Computational Linguistics: ACL 2022
Medical code prediction from clinical notes aims at automatically associating medical codes with the clinical notes. Rare code problem, the medical codes with low occurrences, is prominent in medical code prediction. Recent studies employ deep neural networks and the external knowledge to tackle it. However, such approaches lack interpretability which is a vital issue in medical application. Moreover, due to the lengthy and noisy clinical notes, such approaches fail to achieve satisfactory results. Therefore, in this paper, we propose a novel framework based on medical concept driven attention to incorporate external knowledge for explainable medical code prediction. In specific, both the clinical notes and Wikipedia documents are aligned into topic space to extract medical concepts using topic modeling. Then, the medical concept-driven attention mechanism is applied to uncover the medical code related concepts which provide explanations for medical code prediction. Experimental results on the benchmark dataset show the superiority of the proposed framework over several state-of-the-art baselines.