@inproceedings{sharif-etal-2022-bad,
    title = "{M}-{BAD}: A Multilabel Dataset for Detecting Aggressive Texts and Their Targets",
    author = "Sharif, Omar  and
      Hossain, Eftekhar  and
      Hoque, Mohammed Moshiul",
    editor = "Chakraborty, Tanmoy  and
      Akhtar, Md. Shad  and
      Shu, Kai  and
      Bernard, H. Russell  and
      Liakata, Maria  and
      Nakov, Preslav  and
      Srivastava, Aseem",
    booktitle = "Proceedings of the Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situations",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2022.constraint-1.9/",
    doi = "10.18653/v1/2022.constraint-1.9",
    pages = "75--85",
    abstract = "Recently, detection and categorization of undesired (e. g., aggressive, abusive, offensive, hate) content from online platforms has grabbed the attention of researchers because of its detrimental impact on society. Several attempts have been made to mitigate the usage and propagation of such content. However, most past studies were conducted primarily for English, where low-resource languages like Bengali remained out of the focus. Therefore, to facilitate research in this arena, this paper introduces a novel multilabel Bengali dataset (named M-BAD) containing 15650 texts to detect aggressive texts and their targets. Each text of M-BAD went through rigorous two-level annotations. At the primary level, each text is labelled as either aggressive or non-aggressive. In the secondary level, the aggressive texts have been further annotated into five fine-grained target classes: religion, politics, verbal, gender and race. Baseline experiments are carried out with different machine learning (ML), deep learning (DL) and transformer models, where Bangla-BERT acquired the highest weighted $f_1$-score in both detection (0.92) and target identification (0.83) tasks. Error analysis of the models exhibits the difficulty to identify context-dependent aggression, and this work argues that further research is required to address these issues."
}Markdown (Informal)
[M-BAD: A Multilabel Dataset for Detecting Aggressive Texts and Their Targets](https://preview.aclanthology.org/ingest-emnlp/2022.constraint-1.9/) (Sharif et al., CONSTRAINT 2022)
ACL