基于RoBERTa的中文仇恨言论侦测方法研究(Chinese Hate Speech detection method Based on RoBERTa-WWM)
Xiaojun Rao, Yangsen Zhang, Qilong Jia, Xueyang Liu, 晓俊 饶, 仰森 张, 爽 彭, 启龙 贾, 雪阳 刘
Abstract
“随着互联网的普及,社交媒体虽然提供了交流观点的平台,但因其虚拟性和匿名性也加剧了仇恨言论的传播,因此自动侦测仇恨言论对于维护社交媒体平台的文明发展至关重要。针对以上问题,构建了一个中文仇恨言论数据集CHSD,并提出了一种中文仇恨言论侦测模型RoBERTa-CHHSD。该模型首先采用RoBERTa预训练语言模型对中文仇恨言论进行序列化处理,提取文本特征信息;再分别接入TextCNN模型和Bi-GRU模型,提取多层次局部语义特征和句子间全局依赖关系信息;将二者结果融合来提取文本中更深层次的仇恨言论特征,对中文仇恨言论进行分类,从而实现中文仇恨言论的侦测。实验结果表明,本模型在CHSD数据集上的F1值为89.12%,与当前最优主流模型RoBERTa-WWM相比提升了1.76%。”- Anthology ID:
- 2023.ccl-1.44
- Volume:
- Proceedings of the 22nd Chinese National Conference on Computational Linguistics
- Month:
- August
- Year:
- 2023
- Address:
- Harbin, China
- Editors:
- Maosong Sun, Bing Qin, Xipeng Qiu, Jing Jiang, Xianpei Han
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 501–511
- Language:
- Chinese
- URL:
- https://aclanthology.org/2023.ccl-1.44
- DOI:
- Cite (ACL):
- Xiaojun Rao, Yangsen Zhang, Qilong Jia, Xueyang Liu, 晓俊 饶, 仰森 张, 爽 彭, 启龙 贾, and 雪阳 刘. 2023. 基于RoBERTa的中文仇恨言论侦测方法研究(Chinese Hate Speech detection method Based on RoBERTa-WWM). In Proceedings of the 22nd Chinese National Conference on Computational Linguistics, pages 501–511, Harbin, China. Chinese Information Processing Society of China.
- Cite (Informal):
- 基于RoBERTa的中文仇恨言论侦测方法研究(Chinese Hate Speech detection method Based on RoBERTa-WWM) (Rao et al., CCL 2023)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2023.ccl-1.44.pdf