The AI Committee: A Multi-Agent Framework for Automated Validation and Remediation of Web-Sourced Data

Sunith Vallabhaneni, Thomas Berkane, Maimuna S. Majumder


Abstract
Many research areas rely on data from theweb to gain insights and test their methods.However, collecting comprehensive researchdatasets often demands manually reviewingmany web pages to identify and record relevantdata points, which is labor-intensive and sus-ceptible to error. While the emergence of largelanguage models (LLM)-powered web agentshas begun to automate parts of this process,they often struggle to ensure the validity of thedata they collect. Indeed, these agents exhibitseveral recurring failure modes—including hal-lucinating or omitting values, misinterpretingpage semantics, and failing to detect invalidinformation—which are subtle and difficultto detect and correct manually. To addressthis, we introduce the AI Committee, a novelmodel-agnostic multi-agent system that auto-mates the process of validating and remediatingweb-sourced datasets. Each agent is special-ized in a distinct task in the data quality assur-ance pipeline, from source scrutiny and fact-checking to data remediation and integrity val-idation. The AI Committee leverages variousLLM capabilities—including in-context learn-ing for dataset adaptation, chain-of-thought rea-soning for complex semantic validation, and aself-correction loop for data remediation—allwithout task-specific training. We demonstratethe effectiveness of our system by applyingit to three real-world datasets, showing that itgeneralizes across LLMs and significantly out-performs baseline approaches, achieving datacompleteness up to 73.3% and precision up to97.3%. We additionally conduct an ablationstudy demonstrating the contribution of eachagent to the Committee’s performance. Thiswork is released as an open-source tool for theresearch community
Anthology ID:
2026.eacl-demo.41
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:
March
Year:
2026
Address:
Rabat, Marocco
Editors:
Danilo Croce, Jochen Leidner, Nafise Sadat Moosavi
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
583–590
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-demo.41/
DOI:
Bibkey:
Cite (ACL):
Sunith Vallabhaneni, Thomas Berkane, and Maimuna S. Majumder. 2026. The AI Committee: A Multi-Agent Framework for Automated Validation and Remediation of Web-Sourced Data. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 583–590, Rabat, Marocco. Association for Computational Linguistics.
Cite (Informal):
The AI Committee: A Multi-Agent Framework for Automated Validation and Remediation of Web-Sourced Data (Vallabhaneni et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-demo.41.pdf