Transitive self-consistency evaluation of NLI models without gold labels

Wei Wu; Mark Last

Transitive self-consistency evaluation of NLI models without gold labels

Abstract

Natural Language Inference (NLI) is an important task in natural language processing. NLI models are aimed at automatically determining logical relationships between pairs of sentences. However, recent studies based on gold labels assigned to sentence pairs by human experts have provided some evidence that NLI models tend to make inconsistent model decisions during inference. Previous studies have used existing NLI datasets to test the transitive consistency of language models. However, they test only variations of two transitive consistency rules out of four. To further evaluate the transitive consistency of NLI models, we propose a novel evaluation approach that allows us to test all four rules automatically by generating adversarial examples via antonym replacements. Since we are testing self-consistency, human labeling of generated adversarial examples is unnecessary. Our experiments on several benchmark datasets indicate that the examples generated by the proposed antonym replacement methodology can reveal transitive inconsistencies in the state-of-the-art NLI models.

Anthology ID:: 2025.emnlp-main.1152
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 22637–22653
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1152/
DOI:
Bibkey:
Cite (ACL):: Wei Wu and Mark Last. 2025. Transitive self-consistency evaluation of NLI models without gold labels. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 22637–22653, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Transitive self-consistency evaluation of NLI models without gold labels (Wu & Last, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1152.pdf
Checklist:: 2025.emnlp-main.1152.checklist.pdf

PDF Cite Search Checklist Fix data