Integrating Data Validation with Large Language Models for Regulation-Guided Tabular Anomaly Detection

Haoliang Huang; Zihuang Cai; Zhuo Tang; Yifan Liu; Chen Tian; Kenli Li; Changjian Chen

Integrating Data Validation with Large Language Models for Regulation-Guided Tabular Anomaly Detection

Haoliang Huang, Zihuang Cai, Zhuo Tang, Yifan Liu, Chen Tian, Kenli Li, Changjian Chen

Abstract

In many real-world applications, such as medical insurance, many regulations exist that define how data should comply with certain standards. Auditors typically use these regulations to identify anomalies in tabular data. However, existing tabular anomaly detection methods often focus on detecting anomalies based on data distribution without considering regulatory compliance. In this paper, we introduce a new task, Regulation-guided Tabular Anomaly Detection, which leverages regulations to detect anomalies in tabular data. We also developed three new datasets for this task. To address this task, we present RegValidator, a training-free method that integrates data validation with large language models (LLMs) for detecting anomalies. In this process, the LLMs generate ideas for anomaly detection from a regulation perspective, while the data validation validates these ideas from a data distribution perspective. This process can be framed as a Budgeted Maximum Coverage problem, which can be solved by a constant-factor approximation algorithm with provable guarantees. Empirical results on the new datasets demonstrate that our method outperforms existing baselines. A field experiment in a commercial health insurance company also reveals the practical value of our method. Our code is available at https://github.com/hnu-vis/RegValidator.

Anthology ID:: 2026.acl-long.297
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6559–6581
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.297/
DOI:
Bibkey:
Cite (ACL):: Haoliang Huang, Zihuang Cai, Zhuo Tang, Yifan Liu, Chen Tian, Kenli Li, and Changjian Chen. 2026. Integrating Data Validation with Large Language Models for Regulation-Guided Tabular Anomaly Detection. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6559–6581, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Integrating Data Validation with Large Language Models for Regulation-Guided Tabular Anomaly Detection (Huang et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.297.pdf
Checklist:: 2026.acl-long.297.checklist.pdf

PDF Cite Search Checklist Fix data