MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents
Tomer Wolfson, Harsh Trivedi, Mor Geva, Yoav Goldberg, Dan Roth, Tushar Khot, Ashish Sabharwal, Reut Tsarfaty
Abstract
Automated agents, powered by large language models (LLMs), are emerging as the go-to tool for querying information. However, evaluation benchmarks for LLM agents rarely feature natural questions that are both information-seeking and genuinely time-consuming for humans. To address this gap we introduce MoNaCo, a benchmark of 1,315 natural and time-consuming questions that require dozens, and at times hundreds, of intermediate steps to solve— far more than any existing QA benchmark. To build MoNaCo, we developed a decomposed annotation pipeline to elicit and manually answer real-world time-consuming questions at scale. Frontier LLMs evaluated on MoNaCo achieve at most 61.2% F1, hampered by low recall and hallucinations. Our results underscore the limitations of LLM-powered agents in handling the complexity and sheer breadth of real-world information-seeking tasks—with MoNaCo providing an effective resource for tracking such progress. The MoNaCo benchmark, codebase, prompts, and models predictions are all publicly available at: https://tomerwolgithub.github.io/monaco.- Anthology ID:
- 2026.tacl-1.2
- Volume:
- Transactions of the Association for Computational Linguistics, Volume 14
- Month:
- Year:
- 2026
- Address:
- Cambridge, MA
- Venue:
- TACL
- SIG:
- Publisher:
- MIT Press
- Note:
- Pages:
- 23–46
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.tacl-1.2/
- DOI:
- 10.1162/tacl.a.64
- Cite (ACL):
- Tomer Wolfson, Harsh Trivedi, Mor Geva, Yoav Goldberg, Dan Roth, Tushar Khot, Ashish Sabharwal, and Reut Tsarfaty. 2026. MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents. Transactions of the Association for Computational Linguistics, 14:23–46.
- Cite (Informal):
- MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents (Wolfson et al., TACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.tacl-1.2.pdf