Harsha Aduri
2026
DeepResearch Retail: Benchmarking Tool-Augmented Deep Research in the E-Commerce Domain
Rafael Ferreira | Flavio Di Palo | Huilin Lu | Ayush Jain | Harsha Aduri
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Rafael Ferreira | Flavio Di Palo | Huilin Lu | Ayush Jain | Harsha Aduri
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Deep Research (DR) systems autonomously retrieve and synthesize information from web sources, however, industrial DR applications face a critical gap: effective integration of internal tools with web search. In this work, we introduce DeepResearch Retail, an evaluation framework grounded in real-world e-commerce data for assessing Deep Research with tools (DR+Tools) in realistic commercial settings. The framework evaluates both factual faithfulness and multidimensional response quality when reasoning over heterogeneous web and internal data sources.We further present Hybrid-ReAct, a multi-agent architecture that demonstrates how collaborative reasoning and tool use can produce evidence-grounded answers. Experimental results validate our framework’s utility, showing improvements in agent’s performance when leveraging web-page information and multi-agent specialization.