基于词信息嵌入的汉语构词结构识别研究(Chinese Word-Formation Prediction based on Representations of Word-Related Features)
Hua Zheng (郑婳), Yaqi Yan (殷雅琦), Yue Wang (王悦), Damai Dai (代达劢), Yang Liu (刘扬)
Abstract
作为一种意合型语言,汉语中的构词结构刻画了构词成分之间的组合关系,是认知、理解词义的关键。在中文信息处理领域,此前的构词结构识别工作大多沿用句法层面的粗粒度标签,且主要基于上下文等词间信息建模,忽略了语素义、词义等词内信息对构词结构识别的作用。本文采用语言学视域下的构词结构标签体系,构建汉语构词结构及相关信息数据集,提出了一种基于Bi-LSTM和Self-attention的模型,以此来探究词内、词间等多方面信息对构词结构识别的潜在影响和能达到的性能。实验取得了良好的预测效果,准确率77.87%,F1值78.36%;同时,对比测试揭示,词内的语素义信息对构词结构识别具有显著的贡献,而词间的上下文信息贡献较弱且带有较强的不稳定性。该预测方法与数据集,将为中文信息处理的多种任务,如语素和词结构分析、词义识别与生成、语言文字研究与词典编纂等提供新的观点和方案。- Anthology ID:
- 2021.ccl-1.36
- Volume:
- Proceedings of the 20th Chinese National Conference on Computational Linguistics
- Month:
- August
- Year:
- 2021
- Address:
- Huhhot, China
- Editors:
- Sheng Li (李生), Maosong Sun (孙茂松), Yang Liu (刘洋), Hua Wu (吴华), Kang Liu (刘康), Wanxiang Che (车万翔), Shizhu He (何世柱), Gaoqi Rao (饶高琦)
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 386–397
- Language:
- Chinese
- URL:
- https://aclanthology.org/2021.ccl-1.36
- DOI:
- Cite (ACL):
- Hua Zheng, Yaqi Yan, Yue Wang, Damai Dai, and Yang Liu. 2021. 基于词信息嵌入的汉语构词结构识别研究(Chinese Word-Formation Prediction based on Representations of Word-Related Features). In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 386–397, Huhhot, China. Chinese Information Processing Society of China.
- Cite (Informal):
- 基于词信息嵌入的汉语构词结构识别研究(Chinese Word-Formation Prediction based on Representations of Word-Related Features) (Zheng et al., CCL 2021)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2021.ccl-1.36.pdf