Harry Mackenzie


2026

Large language models (LLMs) frequently generate factually incorrect or unverifiable statements, motivating tool-augmented verification systems that combine model reasoning with external evidence retrieval. For factuality evaluation to be scientifically reliable, verification pipelines must be controllable and reproducible: retrieval configuration and reasoning behaviour should be explicitly configurable and stable across runs. In practice, many existing systems depend on commercial search APIs whose ranking policies and retrieval behaviours are opaque and externally controlled, introducing uncontrolled variability into evaluation. This makes it difficult to disentangle reasoning errors from retrieval effects. We present FactSearch, a reproducibility-oriented agentic fact search system for claim-level factuality verification, built on a locally aggregated open-source search infrastructure. FactSearch follows an agentic verification workflow: it decomposes model outputs into atomic factual claims, generates targeted search queries, retrieves supporting evidence via a self-hosted meta-search engine, and performs modular verification within a fully configurable pipeline. By treating retrieval infrastructure as a first-class component, the system enables systematic analysis of retrieval–reasoning interactions. An interactive web interface supports transparent inspection and practical deployment. The project is available at https://factsearch.github.io.