Kenli Li

2026

Integrating Data Validation with Large Language Models for Regulation-Guided Tabular Anomaly Detection
Haoliang Huang | Zihuang Cai | Zhuo Tang | Yifan Liu | Chen Tian | Kenli Li | Changjian Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In many real-world applications, such as medical insurance, many regulations exist that define how data should comply with certain standards. Auditors typically use these regulations to identify anomalies in tabular data. However, existing tabular anomaly detection methods often focus on detecting anomalies based on data distribution without considering regulatory compliance. In this paper, we introduce a new task, Regulation-guided Tabular Anomaly Detection, which leverages regulations to detect anomalies in tabular data. We also developed three new datasets for this task. To address this task, we present RegValidator, a training-free method that integrates data validation with large language models (LLMs) for detecting anomalies. In this process, the LLMs generate ideas for anomaly detection from a regulation perspective, while the data validation validates these ideas from a data distribution perspective. This process can be framed as a Budgeted Maximum Coverage problem, which can be solved by a constant-factor approximation algorithm with provable guarantees. Empirical results on the new datasets demonstrate that our method outperforms existing baselines. A field experiment in a commercial health insurance company also reveals the practical value of our method. Our code is available at https://github.com/hnu-vis/RegValidator.

pdf bib abs

Pruning is essential for the efficient deployment of Large Language Models (LLMs); however, it causes severe performance degradation due to the structural distortion induced by sparsity.Existing recovery strategies, such as LoRA, predominantly employ global fine-tuning, often overlooking the mechanistic root of this degradation: the layer-wise accumulation and amplification of local errors. To address this limitation, we propose LaCo(Layer-wise Compensation), a framework that reorients the recovery paradigm from global adaptation to hierarchical representation alignment.By sequentially optimizing each layer to reconstruct the model’s hidden states, LaCo effectively intercept the error propagation chain at its source.Extensive experiments demonstrate that LaCo surpasses parameter-efficient baselines in both perplexity reduction and zero-shot reasoning.Notably, it reduces recovery-time memory usage to approximately 1/7 of the baseline and requires only 2,048 unlabeled samples to match a LoRA model trained on 50k examples—achieving a ∼25× improvement in data efficiency.

Co-authors

Venues

ACL2

Fix author