Xingzhong Xu
2026
InsLogicBench: An Argumentation Logic Grounded Benchmark for Complex Insurance Claims Adjudication
Jin Liu | Yunpeng Liu | Keyi Wang | Jie Shi | Xiao Xu | Wenkang Huang | Xingzhong Xu | Xin Liang | Yanghua Xiao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jin Liu | Yunpeng Liu | Keyi Wang | Jie Shi | Xiao Xu | Wenkang Huang | Xingzhong Xu | Xin Liang | Yanghua Xiao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Insurance claims adjudication demands not only accurate decisions but also interpretable reasoning grounded in policy clauses. However, existing benchmarks are limited to information retrieval or simple multiple-choice setups, which fail to require step-by-step inferences from facts to conclusions. To address this gap, we introduce InsLogicBench, a benchmark providing complete reasoning traces that link factual inputs, relevant policy clauses, and final verdicts. We construct the dataset using a controllable synthesis framework based on the Nested Toulmin Model. By capturing the defeasible logic of insurance policies through hierarchical truth assignment and enforcing validity via consistency verification, we ensure interpretability and logical rigor across generated examples. We evaluate eight Large Language Models (LLMs) on InsLogicBench. Results show significant difficulties in handling exception clauses and verifying missing conditions. Notably, models often produce correct final decisions but fail to provide precise justifications, highlighting a critical discrepancy between their decision accuracy and logical reasoning capabilities.