Kelly L. Anderson

2025

pdf bib abs
MatViX: Multimodal Information Extraction from Visually Rich Articles
Ghazal Khalighinejad | Sharon Scott | Ollie Liu | Kelly L. Anderson | Rickard Stureborg | Aman Tyagi | Bhuwan Dhingra
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Multimodal information extraction (MIE) is crucial for scientific literature, where valuable data is often spread across text, figures, and tables. In materials science, extracting structured information from research articles can accelerate the discovery of new materials. However, the multimodal nature and complex interconnections of scientific content present challenges for traditional text-based methods. We introduce MatViX, a benchmark consisting of 324 full-length research articles and 1,688 complex structured JSON files, carefully curated by domain experts in polymer nanocomposites and biodegradation. These JSON files are extracted from text, tables, and figures in full-length documents, providing a comprehensive challenge for MIE. We introduce a novel evaluation method to assess the accuracy of curve similarity and the alignment of hierarchical structures. Additionally, we benchmark vision-language models (VLMs) in a zero-shot manner, capable of processing long contexts and multimodal inputs. Our results demonstrate significant room for improvement in current models.

Co-authors

Aman Tyagi 1

Venues

naacl1

Fix data