@inproceedings{wu-etal-2018-decipherment,
    title = "Decipherment for Adversarial Offensive Language Detection",
    author = "Wu, Zhelun  and
      Kambhatla, Nishant  and
      Sarkar, Anoop",
    editor = "Fi{\v{s}}er, Darja  and
      Huang, Ruihong  and
      Prabhakaran, Vinodkumar  and
      Voigt, Rob  and
      Waseem, Zeerak  and
      Wernimont, Jacqueline",
    booktitle = "Proceedings of the 2nd Workshop on Abusive Language Online ({ALW}2)",
    month = oct,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/iwcs-25-ingestion/W18-5119/",
    doi = "10.18653/v1/W18-5119",
    pages = "149--159",
    abstract = "Automated filters are commonly used by online services to stop users from sending age-inappropriate, bullying messages, or asking others to expose personal information. Previous work has focused on rules or classifiers to detect and filter offensive messages, but these are vulnerable to cleverly disguised plaintext and unseen expressions especially in an adversarial setting where the users can repeatedly try to bypass the filter. In this paper, we model the disguised messages as if they are produced by encrypting the original message using an invented cipher. We apply automatic decipherment techniques to decode the disguised malicious text, which can be then filtered using rules or classifiers. We provide experimental results on three different datasets and show that decipherment is an effective tool for this task."
}Markdown (Informal)
[Decipherment for Adversarial Offensive Language Detection](https://preview.aclanthology.org/iwcs-25-ingestion/W18-5119/) (Wu et al., ALW 2018)
ACL