LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference
Mutsumi Nakamura, Santosh Mashetty, Mihir Parmar, Neeraj Varshney, Chitta Baral
Abstract
Recently Large Language Models (LLMs) such as GPT-3, ChatGPT, and FLAN have led to impressive progress in Natural Language Inference (NLI) tasks. However, these models may rely on simple heuristics or artifacts in the evaluation data to achieve their high performance, which suggests that they still suffer from logical inconsistency. To assess the logical consistency of these models, we propose a LogicAttack, a method to attack NLI models using diverse logical forms of premise and hypothesis, providing a more robust evaluation of their performance. Our approach leverages a range of inference rules from propositional logic, such as Modus Tollens and Bidirectional Dilemma, to generate effective adversarial attacks and identify common vulnerabilities across multiple NLI models. We achieve an average ~53% Attack Success Rate (ASR) across multiple logic-based attacks. Moreover, we demonstrate that incorporating generated attack samples into training enhances the logical reasoning ability of the target model and decreases its vulnerability to logic-based attacks. Data and source code are available at https://github.com/msantoshmadhav/LogicAttack.- Anthology ID:
- 2023.findings-emnlp.889
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 13322–13334
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.889
- DOI:
- 10.18653/v1/2023.findings-emnlp.889
- Cite (ACL):
- Mutsumi Nakamura, Santosh Mashetty, Mihir Parmar, Neeraj Varshney, and Chitta Baral. 2023. LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13322–13334, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference (Nakamura et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2023.findings-emnlp.889.pdf