Liujuan Cao


2026

Large language models (LLMs) are increasingly deployed in high-stakes domains reliant on tabular data (e.g., financial reporting), where undetected logical inconsistencies such as mismatched totals and components can lead to critical errors. Yet, the ability of LLMs to identify such inconsistencies remains poorly understood, hindered by the absence of standardized evaluation frameworks and cell-level annotated datasets. To bridge this gap, we propose a comprehensive benchmark SEC-Fintables comprising 103,395 real-world and error-injected table instances, alongside a novel evaluation protocol that decomposes inconsistency detection into granular sub-tasks. Through evaluating both proprietary and open-source LLMs on SEC-Fintables, we find that contemporary LLMs exhibit only partial competence in detecting logical inconsistencies. Our study reveals key limitations and improvement opportunities for LLMs. We believe SEC-Fintables and our evaluation protocol can serve as a practical resource for advancing reliable inconsistency detection of LLMs in tabular reasoning. We release SEC-Fintables at https://github.com/XIEFOX/SEC-Fintables.
Factual knowledge stored in Large Language Models (LLMs) inevitably becomes outdated or erroneous over time, making it critical to update these models without incurring the high cost of retraining. Existing sequential knowledge editing methods predominantly rely on strict orthogonal projection to preserve previously edited knowledge. However, this excessive constraint limits gradient expressiveness, resulting in a significant degradation of model generalization and overall performance as the number of edits increases. To address this challenge, we propose Dual-Importance Projection Editing (DipEdit). This method leverages Singular Value Decomposition (SVD) to identify critical gradient subspaces and introduces a dual mechanism comprising "accumulated importance" and "projection importance." Unlike traditional approaches that enforce strict orthogonality, DipEdit dynamically scales gradient components parallel to key subspaces based on their projection importance rather than discarding them directly. This approach enhances the model’s adaptability to new knowledge while maximally preserving historical knowledge. Extensive experiments conducted on five mainstream LLMs using the ZsRE and Counterfact datasets demonstrate that DipEdit effectively handles thousands of sequential edits. The proposed method achieves an average comprehensive performance improvement of 10.36% and effectively maintains the model’s general capabilities on downstream tasks. Code is available at: https://github.com/czhhhla/DipEdit.