PathoHR: Hierarchical Reasoning for Vision-Language Models in Pathology

Yating Huang; Ziyan Huang; Lintao Xiang; Qijun Yang; Hujun Yin

doi:10.18653/v1/2025.findings-emnlp.124

PathoHR: Hierarchical Reasoning for Vision-Language Models in Pathology

Yating Huang, Ziyan Huang, Lintao Xiang, Qijun Yang, Hujun Yin

Abstract

Accurate analysis of pathological images is essential for automated tumor diagnosis but remains challenging due to high structural similarity and subtle morphological variations in tissue images. Current vision-language (VL) models often struggle to capture the complex reasoning required for interpreting structured pathological reports. To address these limitations, we propose PathoHR-Bench, a novel benchmark designed to evaluate VL models’ abilities in hierarchical semantic understanding and compositional reasoning within the pathology domain. Results of this benchmark reveal that existing VL models fail to effectively model intricate cross-modal relationships, hence limiting their applicability in clinical setting. To overcome this, we further introduce a pathology-specific VL training scheme that generates enhanced and perturbed samples for multimodal contrastive learning. Experimental evaluations demonstrate that our approach achieves state-of-the-art performance on PathoHR-Bench and six additional pathology datasets, highlighting its effectiveness in fine-grained pathology representation.

Anthology ID:: 2025.findings-emnlp.124
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2296–2311
Language:
URL:: https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.124/
DOI:: 10.18653/v1/2025.findings-emnlp.124
Bibkey:
Cite (ACL):: Yating Huang, Ziyan Huang, Lintao Xiang, Qijun Yang, and Hujun Yin. 2025. PathoHR: Hierarchical Reasoning for Vision-Language Models in Pathology. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 2296–2311, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: PathoHR: Hierarchical Reasoning for Vision-Language Models in Pathology (Huang et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.124.pdf
Checklist:: 2025.findings-emnlp.124.checklist.pdf

PDF Cite Search Checklist Fix data