Quest2DataAgent: Automating End-to-End Scientific Data Collection
Tianyu Yang, Yuhan Liu, Sobin Alosious, Ethan A. Brown, Jason R. Rohr, Tengfei Luo, Xiangliang Zhang
Abstract
Scientific research often requires constructing high-quality datasets, yet the current workflows remain labor-intensive, and dependent on domain expertise. Existing approaches automate isolated steps such as retrieval or generation, but lack support for the full end-to-end data collection process. We present Quest2DataAgent, a general-purpose multi-agent framework for automating scientific data collection workflows. Given a natural language research question, it decomposes tasks into structured subtasks, retrieves relevant data using hybrid strategies, evaluates dataset quality, and generates visualizations through a conversational interface. We demonstrate its flexibility in two domains: EcoData for ecological research and PolyData for polymer materials. Both systems share the same core architecture but operate over distinct datasets and user needs. Human evaluations show that Quest2DataAgent significantly improves data relevance, usability, and time efficiency compared to manual collection and tool-assisted baselines. The framework is open-source and extensible to other domains.- Anthology ID:
- 2025.emnlp-demos.36
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Ivan Habernal, Peter Schulam, Jörg Tiedemann
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 500–514
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-demos.36/
- DOI:
- Cite (ACL):
- Tianyu Yang, Yuhan Liu, Sobin Alosious, Ethan A. Brown, Jason R. Rohr, Tengfei Luo, and Xiangliang Zhang. 2025. Quest2DataAgent: Automating End-to-End Scientific Data Collection. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 500–514, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Quest2DataAgent: Automating End-to-End Scientific Data Collection (Yang et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-demos.36.pdf