Arun Das
2025
FigEx: Aligned Extraction of Scientific Figures and Captions
Jifeng Song
|
Arun Das
|
Ge Cui
|
Yufei Huang
Findings of the Association for Computational Linguistics: EMNLP 2025
Automatic understanding of figures in scientific papers is challenging since they often contain subfigures and subcaptions in complex layouts. In this paper, we propose FigEx, a vision-language model to extract aligned pairs of subfigures and subcaptions from scientific papers. We also release BioSci-Fig, a curated dataset of 7,174 compound figures with annotated subfigure bounding boxes and aligned subcaptions. On BioSci-Fig, FigEx improves subfigure detection APb over Grounding DINO by 0.023 and boosts caption separation BLEU over Llama-2-13B by 0.465. The source code is available at: https://github.com/Huang-AI4Medicine-Lab/FigEx.