Multi-Modal Data Exploration via Language Agents
Farhad Nooralahzadeh, Yi Zhang, Jonathan Fürst, Kurt Stockinger
Abstract
International enterprises, organizations, and hospitals collect large amounts of multi-modal data stored in databases, text documents, images, and videos. While there has been recent progress in the separate fields of multi-modal data exploration as well as in database systems that automatically translate natural language questions to database query languages, the research challenge of querying both structured databases and unstructured modalities (e.g., texts, images) in natural language remains largely unexplored.In this paper, we propose M2EX, a system that enables multi-modal data exploration via language agents. Our approach is based on the following research contributions: (1) Our system is inspired by a real-world use case that enables users to explore multi-modal information systems. (2) M2EX leverages an LLM-based agentic AI framework to decompose a natural language question into subtasks such as text-to-SQL generation and image analysis and to orchestrate modality-specific experts in an efficient query plan. (3) Experimental results on multi-modal datasets, encompassing relational data, text, and images, demonstrate that our system outperforms state-of-the-art multi-modal exploration systems, excelling in both accuracy and various performance metrics, including query latency, API costs, and planning efficiency, thanks to the more effective utilization of the reasoning capabilities of LLMs.- Anthology ID:
- 2025.findings-ijcnlp.47
- Volume:
- Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
- Month:
- December
- Year:
- 2025
- Address:
- Mumbai, India
- Editors:
- Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
- Venue:
- Findings
- SIG:
- Publisher:
- The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
- Note:
- Pages:
- 795–813
- Language:
- URL:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.findings-ijcnlp.47/
- DOI:
- Cite (ACL):
- Farhad Nooralahzadeh, Yi Zhang, Jonathan Fürst, and Kurt Stockinger. 2025. Multi-Modal Data Exploration via Language Agents. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 795–813, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
- Cite (Informal):
- Multi-Modal Data Exploration via Language Agents (Nooralahzadeh et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.findings-ijcnlp.47.pdf