基于神经网络的半监督CRF中文分词(Semi-supervised CRF Chinese Word Segmentation based on Neural Network)
Zhiyong Luo (罗智勇), Mingming Zhang (张明明), Yujiao Han (韩玉蛟), Zhilin Zhao (赵志琳)
Abstract
“分词是中文信息处理的基础任务之一。目前全监督中文分词技术已相对成熟并在通用领域取得较好效果,但全监督方法存在依赖大规模标注语料且领域迁移能力差的问题,特别是跨领域未登录词识别性能不佳。为缓解上述问题,本文提出了一种充分利用相对易得的目标领域无标注文本、实现跨领域迁移的半监督中文分词框架;并设计实现了基于词记忆网络和序列条件熵的半监督权杒杆中文分词模型。实验结果表明本该模型在多个领域数据集上杆札值和杒杏杏杖值分别取得最高朲.朳朵朥和朱朲.朱朲朥的提升,并在多个数据集上成为当前好结果。”- Anthology ID:
- 2022.ccl-1.58
- Volume:
- Proceedings of the 21st Chinese National Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Nanchang, China
- Editors:
- Maosong Sun (孙茂松), Yang Liu (刘洋), Wanxiang Che (车万翔), Yang Feng (冯洋), Xipeng Qiu (邱锡鹏), Gaoqi Rao (饶高琦), Yubo Chen (陈玉博)
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 644–655
- Language:
- Chinese
- URL:
- https://aclanthology.org/2022.ccl-1.58
- DOI:
- Cite (ACL):
- Zhiyong Luo, Mingming Zhang, Yujiao Han, and Zhilin Zhao. 2022. 基于神经网络的半监督CRF中文分词(Semi-supervised CRF Chinese Word Segmentation based on Neural Network). In Proceedings of the 21st Chinese National Conference on Computational Linguistics, pages 644–655, Nanchang, China. Chinese Information Processing Society of China.
- Cite (Informal):
- 基于神经网络的半监督CRF中文分词(Semi-supervised CRF Chinese Word Segmentation based on Neural Network) (Luo et al., CCL 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2022.ccl-1.58.pdf