Yinglong Wang
2026
SOAPTriage: SOAP-Guided Multi-View Clinical Text Modeling Framework for Automated ESI Prediction
Enming Wang | Jianlei Wang | Xueping Peng | Hongjiao Guan | Yinglong Wang | Sibo Wei | Jianbin Guo | Ruifeng Xu | Wenpeng Lu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Enming Wang | Jianlei Wang | Xueping Peng | Hongjiao Guan | Yinglong Wang | Sibo Wei | Jianbin Guo | Ruifeng Xu | Wenpeng Lu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Emergency departments (ED) rely on the Emergency Severity Index (ESI) to assess patient acuity and prioritize care, a process that is largely driven by clinical triage text. Despite recent progress in automated ESI prediction, two fundamental challenges remain: the scarcity of high-quality triage text data due to privacy and regulatory constraints and the lack of a clinically grounded triage framework capable of explicitly capturing the multidimensional structure of triage reasoning. To address these challenges, we draw inspiration from the clinically grounded SOAP paradigm, in which SOAP refers to Subjective, Objective, Assessment, and Plan and captures four complementary aspects of clinical reasoning. Building on this paradigm, we propose SOAPTriage, a SOAP-guided multi-view clinical text modeling framework for automated ESI prediction. To mitigate data scarcity, SOAPTriage introduces a Clinical Note Augmentation (CNA) module that generates natural-language triage notes from structured ED records, resulting in 15,393 augmented clinical notes derived from a real-world dataset. To incorporate clinical structure, SOAPTriage employs a SOAP-Guided Encoding (SGE) module that models patient conditions from four complementary SOAP perspectives, together with an adaptive SOAP-Aware Aggregation and Inference (SAAI) module that performs multi-view reasoning to infer ESI levels. Extensive experiments show that SOAPTriage consistently outperforms strong prompting-based, multi-agent, and encoder-based baselines, demonstrating the effectiveness of SOAP-guided multi-view clinical text modeling for automated emergency triage.
2025
VisFinEval: A Scenario-Driven Chinese Multimodal Benchmark for Holistic Financial Understanding
Zhaowei Liu | Xin Guo | Haotian Xia | Lingfeng Zeng | Fangqi Lou | Jinyi Niu | Mengping Li | Qi Qi | Jiahuan Li | Wei Zhang | Yinglong Wang | Weige Cai | Weining Shen | Liwen Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Zhaowei Liu | Xin Guo | Haotian Xia | Lingfeng Zeng | Fangqi Lou | Jinyi Niu | Mengping Li | Qi Qi | Jiahuan Li | Wei Zhang | Yinglong Wang | Weige Cai | Weining Shen | Liwen Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Multimodal large language models (MLLMs) hold great promise for automating complex financial analysis. To comprehensively evaluate their capabilities, we introduce VisFinEval, the first large-scale Chinese benchmark that spans the full front-middle-back office lifecycle of financial tasks. VisFinEval comprises 15,848 annotated question–answer pairs drawn from eight common financial image modalities (e.g., K-line charts, financial statements, official seals), organized into three hierarchical scenario depths: Financial Knowledge & Data Analysis, Financial Analysis & Decision Support, and Financial Risk Control & Asset Optimization. We evaluate 21 state-of-the-art MLLMs in a zero-shot setting. The top model, Qwen-VL-max, achieves an overall accuracy of 76.3%, outperforming non-expert humans but trailing financial experts by over 14 percentage points. Our error analysis uncovers six recurring failure modes—including cross-modal misalignment, hallucinations, and lapses in business-process reasoning—that highlight critical avenues for future research. VisFinEval aims to accelerate the development of robust, domain-tailored MLLMs capable of seamlessly integrating textual and visual financial information. The data and the code are available at https://github.com/SUFE-AIFLM-Lab/VisFinEval.