Dorian Midou
2026
Agent for Numerical Data Retrieval and Understanding by Code Generation and Multimodal Reasoning
Florian Baud | Feda Almuhisen | Dorian Midou
Findings of the Association for Computational Linguistics: ACL 2026
Florian Baud | Feda Almuhisen | Dorian Midou
Findings of the Association for Computational Linguistics: ACL 2026
Numerical data from sensors and time series are widely used in scientific research fields such as nuclear fusion experiments, which generate vast amounts of complex, high-dimensional data. Therefore, efficient numerical data analysis tools are crucial to accelerate experimental research. Large language models (LLMs) have emerged as promising solutions to analyze numerical data with natural language queries. However, LLMs have difficulties treating this type of data as they have been designed for text in the first place. To overcome these limitations, we propose a model-agnostic and data-agnostic agent that processes numerical data by code generation and multimodal reasoning. Our agent demonstrates competitive performance against baselines on benchmark data on numerical data tasks such as sensor data classification and time series understanding. While outperforming them on information retrieval benchmarks, also we have successfully applied our agent in the context of nuclear fusion research, where physicists and Tokamak operators interact with it to plan and analyze fusion experiments.