2025
pdf
bib
abs
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
Yusheng Zhao
|
Xiao Luo
|
Junyu Luo
|
Weizhi Zhang
|
Zhiping Xiao
|
Wei Ju
|
Philip S. Yu
|
Ming Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Multi-modal large language models (MLLMs) have recently achieved great success in processing and understanding information from diverse modalities (e.g., text, audio, and visual signals). Despite their growing popularity, there remains a lack of comprehensive evaluation measuring the audio-visual capabilities of these models, especially in diverse scenarios (e.g., distribution shifts and adversarial attacks). In this paper, we present a multifaceted evaluation of the audio-visual capability of MLLMs, focusing on four key dimensions: effectiveness, efficiency, generalizability, and robustness. Through extensive experiments, we find that MLLMs exhibit strong zero-shot and few-shot generalization abilities, enabling them to achieve great performance with limited data. However, their success relies heavily on the vision modality, which impairs performance when visual input is corrupted or missing. Additionally, while MLLMs are susceptible to adversarial samples, they demonstrate greater robustness compared to traditional models. The experimental results and our observations provide new insights into the audio-visual capabilities of MLLMs, highlighting areas for improvement and offering guidance for future research.
pdf
bib
abs
A Survey of RAG-Reasoning Systems in Large Language Models
Yangning Li
|
Weizhi Zhang
|
Yuyao Yang
|
Wei-Chieh Huang
|
Yaozu Wu
|
Junyu Luo
|
Yuanchen Bei
|
Henry Peng Zou
|
Xiao Luo
|
Yusheng Zhao
|
Chunkit Chan
|
Yankai Chen
|
Zhongfen Deng
|
Yinghui Li
|
Hai-Tao Zheng
|
Dongyuan Li
|
Renhe Jiang
|
Ming Zhang
|
Yangqiu Song
|
Philip S. Yu
Findings of the Association for Computational Linguistics: EMNLP 2025
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge, yet it falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts. This survey synthesizes both strands under a unified reasoning-search perspective. We first map how advanced reasoning optimizes each stage of RAG (Reasoning-Enhanced RAG). Then, we show how retrieved knowledge of different type supply missing premises and expand context for complex inference (RAG-Enhanced Reasoning). Finally, we spotlight emerging Synergized RAG-Reasoning frameworks, where (agentic) LLMs iteratively interleave search and thought to achieve state-of-the-art performance across knowledge-intensive benchmarks. We categorize methods, datasets, and open challenges, and outline research avenues toward deeper RAG-Reasoning systems that are more effective, multimodally-adaptive, trustworthy, and human-centric.
pdf
bib
abs
Automate Strategy Finding with LLM in Quant Investment
Zhizhuo Kou
|
Holam Yu
|
Junyu Luo
|
Jingshu Peng
|
Xujia Li
|
Chengzhong Liu
|
Juntao Dai
|
Lei Chen
|
Sirui Han
|
Yike Guo
Findings of the Association for Computational Linguistics: EMNLP 2025
We present a novel three-stage framework leveraging Large Language Models (LLMs) within a risk-aware multi-agent system for automate strategy finding in quantitative finance. Our approach addresses the brittleness of traditional deep learning models in financial applications by: employing prompt-engineered LLMs to generate executable alpha factor candidates across diverse financial data, implementing multimodal agent-based evaluation that filters factors based on market status, predictive quality while maintaining category balance, and deploying dynamic weight optimization that adapts to market conditions. Experimental results demonstrate the robust performance of the strategy in Chinese & US market regimes compared to established benchmarks. Our work extends LLMs capabilities to quantitative trading, providing a scalable architecture for financial signal extraction and portfolio construction. The overall framework significantly outperforms all benchmarks with 53.17% cumulative return on SSE50 (Jan 2023 to Jan 2024), demonstrating superior risk-adjusted performance and downside protection on the market.
2024
pdf
bib
abs
Unity in Diversity: Collaborative Pre-training Across Multimodal Medical Sources
Xiaochen Wang
|
Junyu Luo
|
Jiaqi Wang
|
Yuan Zhong
|
Xiaokun Zhang
|
Yaqing Wang
|
Parminder Bhatia
|
Cao Xiao
|
Fenglong Ma
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Although pre-training has become a prevalent approach for addressing various biomedical tasks, the current efficacy of pre-trained models is hindered by their reliance on a limited scope of medical sources. This limitation results in data scarcity during pre-training and restricts the range of applicable downstream tasks. In response to these challenges, we develop MedCSP, a new pre-training strategy designed to bridge the gap between multimodal medical sources. MedCSP employs modality-level aggregation to unify patient data within individual sources. Additionally, leveraging temporal information and diagnosis history, MedCSP effectively captures explicit and implicit correlations between patients across different sources. To evaluate the proposed strategy, we conduct comprehensive experiments, where the experiments are based on 6 modalities from 2 real-world medical data sources, and MedCSP is evaluated on 4 tasks against 19 baselines, marking an initial yet essential step towards cross-source modeling in the medical domain.
pdf
bib
abs
Zero-Resource Hallucination Prevention for Large Language Models
Junyu Luo
|
Cao Xiao
|
Fenglong Ma
Findings of the Association for Computational Linguistics: EMNLP 2024
The prevalent use of large language models (LLMs) in various domains has drawn attention to the issue of “hallucination”, which refers to instances where LLMs generate factually inaccurate or ungrounded information. Existing techniques usually identify hallucinations post-generation that cannot prevent their occurrence and suffer from inconsistent performance due to the influence of the instruction format and model style. In this paper, we introduce a novel pre-detection self-evaluation technique, referred to as SELF-FAMILIARITY, which focuses on evaluating the model’s familiarity with the concepts present in the input instruction and withholding the generation of response in case of unfamiliar concepts under the zero-resource setting, where external ground-truth or background information is not available. We also propose a new dataset Concept-7 focusing on the hallucinations caused by limited inner knowledge. We validate SELF-FAMILIARITY across four different large language models, demonstrating consistently superior performance compared to existing techniques. Our findings propose a significant shift towards preemptive strategies for hallucination mitigation in LLM assistants, promising improvements in reliability, applicability, and interpretability.
pdf
bib
abs
CoRelation: Boosting Automatic ICD Coding through Contextualized Code Relation Learning
Junyu Luo
|
Xiaochen Wang
|
Jiaqi Wang
|
Aofei Chang
|
Yaqing Wang
|
Fenglong Ma
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Automatic International Classification of Diseases (ICD) coding plays a crucial role in the extraction of relevant information from clinical notes for proper recording and billing. One of the most important directions for boosting the performance of automatic ICD coding is modeling ICD code relations. However, current methods insufficiently model the intricate relationships among ICD codes and often overlook the importance of context in clinical notes. In this paper, we propose a novel approach, a contextualized and flexible framework, to enhance the learning of ICD code representations. Our approach, unlike existing methods, employs a dependent learning paradigm that considers the context of clinical notes in modeling all possible code relations. We evaluate our approach on six public ICD coding datasets and the experimental results demonstrate the effectiveness of our approach compared to state-of-the-art baselines.
2023
pdf
bib
abs
Hierarchical Pretraining on Multimodal Electronic Health Records
Xiaochen Wang
|
Junyu Luo
|
Jiaqi Wang
|
Ziyi Yin
|
Suhan Cui
|
Yuan Zhong
|
Yaqing Wang
|
Fenglong Ma
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Pretraining has proven to be a powerful technique in natural language processing (NLP), exhibiting remarkable success in various NLP downstream tasks. However, in the medical domain, existing pretrained models on electronic health records (EHR) fail to capture the hierarchical nature of EHR data, limiting their generalization capability across diverse downstream tasks using a single pretrained model. To tackle this challenge, this paper introduces a novel, general, and unified pretraining framework called MedHMP, specifically designed for hierarchically multimodal EHR data. The effectiveness of the proposed MedHMP is demonstrated through experimental results on eight downstream tasks spanning three levels. Comparisons against eighteen baselines further highlight the efficacy of our approach.
2022
pdf
bib
abs
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation
Junyu Luo
|
Junxian Lin
|
Chi Lin
|
Cao Xiao
|
Xinning Gui
|
Fenglong Ma
Proceedings of the 29th International Conference on Computational Linguistics
Patients with low health literacy usually have difficulty understanding medical jargon and the complex structure of professional medical language. Although some studies are proposed to automatically translate expert language into layperson-understandable language, only a few of them focus on both accuracy and readability aspects simultaneously in the clinical domain. Thus, simplification of the clinical language is still a challenging task, but unfortunately, it is not yet fully addressed in previous work. To benchmark this task, we construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. Besides, we propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance compared with eight strong baselines. To fairly evaluate the performance, we also propose three specific evaluation metrics. Experimental results demonstrate the utility of the annotated MedLane dataset and the effectiveness of the proposed model DECLARE.
2021
pdf
bib
Fusion: Towards Automated ICD Coding via Feature Compression
Junyu Luo
|
Cao Xiao
|
Lucas Glass
|
Jimeng Sun
|
Fenglong Ma
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021