GRE Score: Generative Risk Evaluation for Large Language Models

Zaitang LI; Pin-Yu Chen; Tsung-Yi Ho

GRE Score: Generative Risk Evaluation for Large Language Models

Abstract

Large Language Models (LLMs) have revolutionized generative tasks, but concerns about their trustworthiness and vulnerability to adversarial attacks persist. This paper introduces the Generative Robustness Evaluation (GRE) Score, a novel metric designed to assess LLMs’ resilience against adversarial red teaming attempts that may compromise model compliance and elicit undesired responses. Our approach utilizes conditional generation for synthetic text creation, offering an attack-independent evaluation of LLM robustness. By calculating the margin in refusal scores, we quantify the robustness of LLMs in an attack-agnostic manner. We evaluate our method on five dimensions with specified datasets, encompassing ethical considerations, safety protocols, and potential misuse scenarios. We present four contributions: (1) The GRE Score framework, which establishes a textual robustness certificate for LLMs against adversarial red teaming attempts, providing a theoretical foundation for quantifying model resilience. (2) Comprehensive evaluations across five dimensions using eight prominent LLMs, validating GRE Scores with adversarial red teaming attacks. Our method demonstrates a consistent ranking of LLM robustness when compared to the attack-based model ranking on TrustLLM (CITATION) while achieving a significant 5-8x speedup compared to traditional evaluation techniques. (3) Insights into the non-linear relationship between model scaling and performance, revealing that larger models do not always perform better, and an analysis of how instruction-tuning impacts robustness across LLMs. (4) The discovery that all evaluated LLMs exhibit lower performance in robustness and privacy tasks compared to other areas, highlighting a critical gap in capabilities.

Anthology ID:: 2026.findings-acl.1202
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24007–24033
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1202/
DOI:
Bibkey:
Cite (ACL):: Zaitang LI, Pin-Yu Chen, and Tsung-Yi Ho. 2026. GRE Score: Generative Risk Evaluation for Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 24007–24033, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: GRE Score: Generative Risk Evaluation for Large Language Models (LI et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1202.pdf
Checklist:: 2026.findings-acl.1202.checklist.pdf

PDF Cite Search Checklist Fix data