基于半监督学习的中文社交文本事件聚类方法(Semi-supervised Method to Cluster Chinese Events on Social Streams)
Hengrui Guo (郭恒睿), Zhongqing Wang (王中卿), Peifeng Li (李培峰), Qiaoming Zhu (朱巧明)
Abstract
面向社交媒体的事件聚类旨在根据事件特征对短文本聚类。目前,事件聚类模型主要分为无监督模型和有监督模型。无监督模型聚类效果较差,有监督模型依赖大量标注数据。基于此,本文提出了一种半监督事件聚类模型(SemiEC),该模型在小规模标注数据的基础上,利用LSTM表征事件,利用线性模型计算文本相似度,进行增量聚类,利用增量聚类产生的标注数据对模型再训练,结束后对不确定样本再聚类。实验表明,SemiEC的性能相比其他模型均有所提高。- Anthology ID:
- 2020.ccl-1.59
- Volume:
- Proceedings of the 19th Chinese National Conference on Computational Linguistics
- Month:
- October
- Year:
- 2020
- Address:
- Haikou, China
- Editors:
- Maosong Sun (孙茂松), Sujian Li (李素建), Yue Zhang (张岳), Yang Liu (刘洋)
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 634–644
- Language:
- Chinese
- URL:
- https://aclanthology.org/2020.ccl-1.59
- DOI:
- Cite (ACL):
- Hengrui Guo, Zhongqing Wang, Peifeng Li, and Qiaoming Zhu. 2020. 基于半监督学习的中文社交文本事件聚类方法(Semi-supervised Method to Cluster Chinese Events on Social Streams). In Proceedings of the 19th Chinese National Conference on Computational Linguistics, pages 634–644, Haikou, China. Chinese Information Processing Society of China.
- Cite (Informal):
- 基于半监督学习的中文社交文本事件聚类方法(Semi-supervised Method to Cluster Chinese Events on Social Streams) (Guo et al., CCL 2020)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/2020.ccl-1.59.pdf