@inproceedings{sun-etal-2024-crowd,
    title = "{CROWD}: Certified Robustness via Weight Distribution for Smoothed Classifiers against Backdoor Attack",
    author = "Sun, Siqi  and
      Sen, Procheta  and
      Ruan, Wenjie",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2024.findings-emnlp.993/",
    doi = "10.18653/v1/2024.findings-emnlp.993",
    pages = "17056--17070",
    abstract = "Language models are vulnerable to clandestinely modified data and manipulation by attackers. Despite considerable research dedicated to enhancing robustness against adversarial attacks, the realm of provable robustness for backdoor attacks remains relatively unexplored. In this paper, we initiate a pioneering investigation into the certified robustness of NLP models against backdoor triggers.We propose a model-agnostic mechanism for large-scale models that applies to complex model structures without the need for assessing model architecture or internal knowledge. More importantly, we take recent advances in randomized smoothing theory and propose a novel weight-based distribution algorithm to enable semantic similarity and provide theoretical robustness guarantees.Experimentally, we demonstrate the efficacy of our approach across a diverse range of datasets and tasks, highlighting its utility in mitigating backdoor triggers. Our results show strong performance in terms of certified accuracy, scalability, and semantic preservation."
}Markdown (Informal)
[CROWD: Certified Robustness via Weight Distribution for Smoothed Classifiers against Backdoor Attack](https://preview.aclanthology.org/ingest-emnlp/2024.findings-emnlp.993/) (Sun et al., Findings 2024)
ACL