基于RoBERTa的中文仇恨言论侦测方法研究(Chinese Hate Speech detection method Based on RoBERTa-WWM)

Xiaojun Rao, Yangsen Zhang, Qilong Jia, Xueyang Liu, 晓俊 饶, 仰森 张, 爽 彭, 启龙 贾, 雪阳 刘


Abstract
“随着互联网的普及,社交媒体虽然提供了交流观点的平台,但因其虚拟性和匿名性也加剧了仇恨言论的传播,因此自动侦测仇恨言论对于维护社交媒体平台的文明发展至关重要。针对以上问题,构建了一个中文仇恨言论数据集CHSD,并提出了一种中文仇恨言论侦测模型RoBERTa-CHHSD。该模型首先采用RoBERTa预训练语言模型对中文仇恨言论进行序列化处理,提取文本特征信息;再分别接入TextCNN模型和Bi-GRU模型,提取多层次局部语义特征和句子间全局依赖关系信息;将二者结果融合来提取文本中更深层次的仇恨言论特征,对中文仇恨言论进行分类,从而实现中文仇恨言论的侦测。实验结果表明,本模型在CHSD数据集上的F1值为89.12%,与当前最优主流模型RoBERTa-WWM相比提升了1.76%。”
Anthology ID:
2023.ccl-1.44
Volume:
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
Month:
August
Year:
2023
Address:
Harbin, China
Editors:
Maosong Sun, Bing Qin, Xipeng Qiu, Jing Jiang, Xianpei Han
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
501–511
Language:
Chinese
URL:
https://aclanthology.org/2023.ccl-1.44
DOI:
Bibkey:
Cite (ACL):
Xiaojun Rao, Yangsen Zhang, Qilong Jia, Xueyang Liu, 晓俊 饶, 仰森 张, 爽 彭, 启龙 贾, and 雪阳 刘. 2023. 基于RoBERTa的中文仇恨言论侦测方法研究(Chinese Hate Speech detection method Based on RoBERTa-WWM). In Proceedings of the 22nd Chinese National Conference on Computational Linguistics, pages 501–511, Harbin, China. Chinese Information Processing Society of China.
Cite (Informal):
基于RoBERTa的中文仇恨言论侦测方法研究(Chinese Hate Speech detection method Based on RoBERTa-WWM) (Rao et al., CCL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2023.ccl-1.44.pdf