Srihari Unnikrishnan


2026

Tabular anomaly detection is challenging because real-world tables contain heterogeneous columns, ranging from structured identifiers to free-form text. Existing methods face a fundamental trilemma: rule-based systems require extensive manual configuration and fail on novel schemas; statistical methods scale efficiently but miss semantic errors; and LLM-based approaches understand semantics but incur prohibitive per-cell inference costs. No prior method simultaneously addresses semantic heterogeneity, domain-specific validation rules, and enterprise-scale processing.We introduce TabGuard, an agentic framework that resolves this trilemma through semantic routing. Using LLM function calling, the system analyzes a small sample of each column and dynamically selects the most effective validation strategy, routing to a regex-based validator for syntactic patterns, a code-generation validator for domain-specific rules (such as Luhn checksums for credit cards), or an embedding-based validator for distributional outliers. This architecture decouples expensive cognitive reasoning (O(m) LLM calls for m columns) from scalable programmatic execution, enabling deployment on enterprise datasets without per-cell inference.