EVADE: LLM-Based Explanation Generation and Validation for Error Detection in NLI

Longfei Zuo, Barbara Plank, Siyao Peng


Abstract
High-quality datasets are critical for training and evaluating reliable NLP models. In tasks like natural language inference (NLI), human label variation (HLV) arises when multiple labels are valid for the same instance, making it difficult to separate annotation errors from plausible variation. An earlier framework, VariErr (Weber-Genzel et al., 2024), asks multiple annotators to explain their label decisions in the first round and flags errors through validity judgments in the second round. However, conducting two rounds of manual annotation is costly and may limit the coverage of plausible labels or explanations. Our study proposes a new framework, EVADE, for generating and validating explanations to detect errors using large language models (LLMs). We perform a comprehensive analysis comparing human- and LLM-detected errors for NLI across distribution comparison, validation overlap, and impact on model fine-tuning. Our experiments demonstrate that LLM validation refines generated explanation distributions to more closely align with human annotations, and that removing LLM-detected errors from training data yields improvements in fine-tuning performance than removing errors identified by human annotators. This highlights the potential to scale error detection, reducing human effort while improving dataset quality under label variation.
Anthology ID:
2026.findings-acl.65
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1286–1300
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.65/
DOI:
Bibkey:
Cite (ACL):
Longfei Zuo, Barbara Plank, and Siyao Peng. 2026. EVADE: LLM-Based Explanation Generation and Validation for Error Detection in NLI. In Findings of the Association for Computational Linguistics: ACL 2026, pages 1286–1300, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
EVADE: LLM-Based Explanation Generation and Validation for Error Detection in NLI (Zuo et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.65.pdf
Checklist:
 2026.findings-acl.65.checklist.pdf