FigEx: Aligned Extraction of Scientific Figures and Captions

Jifeng Song, Arun Das, Ge Cui, Yufei Huang


Abstract
Automatic understanding of figures in scientific papers is challenging since they often contain subfigures and subcaptions in complex layouts. In this paper, we propose FigEx, a vision-language model to extract aligned pairs of subfigures and subcaptions from scientific papers. We also release BioSci-Fig, a curated dataset of 7,174 compound figures with annotated subfigure bounding boxes and aligned subcaptions. On BioSci-Fig, FigEx improves subfigure detection APb over Grounding DINO by 0.023 and boosts caption separation BLEU over Llama-2-13B by 0.465. The source code is available at: https://github.com/Huang-AI4Medicine-Lab/FigEx.
Anthology ID:
2025.findings-emnlp.899
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16558–16571
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.899/
DOI:
10.18653/v1/2025.findings-emnlp.899
Bibkey:
Cite (ACL):
Jifeng Song, Arun Das, Ge Cui, and Yufei Huang. 2025. FigEx: Aligned Extraction of Scientific Figures and Captions. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 16558–16571, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
FigEx: Aligned Extraction of Scientific Figures and Captions (Song et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.899.pdf
Checklist:
 2025.findings-emnlp.899.checklist.pdf