AdvERSEM: Adversarial Robustness Testing and Training of LLM-based Groundedness Evaluators via Semantic Structure Manipulation

Kaustubh Dhole, Ramraj Chandradevan, Eugene Agichtein


Abstract
Evaluating outputs from large language models (LLMs) presents significant challenges, especially as hallucinations and adversarial manipulations are often difficult to detect. Existing evaluation methods lack robustness against subtle yet intentional linguistic alterations, necessitating novel techniques for reliably assessing model-generated content. Training accurate and robust groundedness evaluators is key for mitigating hallucinations and ensuring the alignment of model or human-generated claims to real-world evidence. However, as we show, many models, while optimizing for accuracy, lack robustness to subtle variations of claims, making them unsuitable and brittle in real-world settings where adversaries employ purposeful and deceitful tactics like hedging to deceive readers, which go beyond surface-level variations. To address this problem, we propose AdvERSem, a controllable adversarial approach to manipulating LLM output via Abstract Meaning Representations (AMR) to generate attack claims of multiple fine-grained types, followed by automatic verification of the correct label. By systematically manipulating a unique linguistic facet AdvERSem provides an interpretable testbed for gauging robustness as well as useful training data. We demonstrate that utilizing these AMR manipulations during training across multiple fact verification datasets helps improve the accuracy and robustness of groundedness evaluation while also minimizing the requirement of costly annotated data. To encourage further systematic evaluation, we release AdvERSem-Test, a manually verified groundedness test-bed.
Anthology ID:
2025.starsem-1.32
Volume:
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Lea Frermann, Mark Stevenson
Venue:
*SEM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
395–408
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.starsem-1.32/
DOI:
Bibkey:
Cite (ACL):
Kaustubh Dhole, Ramraj Chandradevan, and Eugene Agichtein. 2025. AdvERSEM: Adversarial Robustness Testing and Training of LLM-based Groundedness Evaluators via Semantic Structure Manipulation. In Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025), pages 395–408, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
AdvERSEM: Adversarial Robustness Testing and Training of LLM-based Groundedness Evaluators via Semantic Structure Manipulation (Dhole et al., *SEM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.starsem-1.32.pdf