Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO

Nikolay Blagoev; Oguzhan Ersoy; Lydia Chen

Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO

Nikolay Blagoev, Oguzhan Ersoy, Lydia Chen

Abstract

Group Relative Policy Optimization (GRPO) has demonstrated wide adoption in the post-training of Large Language Models (LLMs). In GRPO, prompts are answered by the model and preferred behaviour is learnt via reinforcement learning. Owing to the small communication volume, GRPO is inherently suitable for decentralised training as the prompts can be concurrently answered by multiple nodes and these completions are exchanged in the form of strings. In this work, we explore the robustness of decentralised GRPO by presenting the first adversarial attacks and countermeasures. We present a diverse set of attacks where malicious nodes poison benign models by sharing their poisoned completions. We demonstrate these attacks on math and coding tasks and show that an adversary can achieve attack success rates of up to (100%) in as few as 50 iterations. Moreover, to mitigate the attacks, we propose two defense mechanisms that check logit probabilities of completions or utilize an LLM judge to filter completions. The defenses prevent all but the DoS attack that causes unnecessarily lengthy but conceptually correct completions. The code of both attacks and defenses can be found at: https://github.com/gensyn-ai/HTTT.

Anthology ID:: 2026.findings-acl.1950
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 39130–39148
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1950/
DOI:
Bibkey:
Cite (ACL):: Nikolay Blagoev, Oguzhan Ersoy, and Lydia Chen. 2026. Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO. In Findings of the Association for Computational Linguistics: ACL 2026, pages 39130–39148, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO (Blagoev et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1950.pdf
Checklist:: 2026.findings-acl.1950.checklist.pdf

PDF Cite Search Checklist Fix data