基于词信息嵌入的汉语构词结构识别研究(Chinese Word-Formation Prediction based on Representations of Word-Related Features)

Hua Zheng (郑婳), Yaqi Yan (殷雅琦), Yue Wang (王悦), Damai Dai (代达劢), Yang Liu (刘扬)


Abstract
作为一种意合型语言,汉语中的构词结构刻画了构词成分之间的组合关系,是认知、理解词义的关键。在中文信息处理领域,此前的构词结构识别工作大多沿用句法层面的粗粒度标签,且主要基于上下文等词间信息建模,忽略了语素义、词义等词内信息对构词结构识别的作用。本文采用语言学视域下的构词结构标签体系,构建汉语构词结构及相关信息数据集,提出了一种基于Bi-LSTM和Self-attention的模型,以此来探究词内、词间等多方面信息对构词结构识别的潜在影响和能达到的性能。实验取得了良好的预测效果,准确率77.87%,F1值78.36%;同时,对比测试揭示,词内的语素义信息对构词结构识别具有显著的贡献,而词间的上下文信息贡献较弱且带有较强的不稳定性。该预测方法与数据集,将为中文信息处理的多种任务,如语素和词结构分析、词义识别与生成、语言文字研究与词典编纂等提供新的观点和方案。
Anthology ID:
2021.ccl-1.36
Volume:
Proceedings of the 20th Chinese National Conference on Computational Linguistics
Month:
August
Year:
2021
Address:
Huhhot, China
Editors:
Sheng Li (李生), Maosong Sun (孙茂松), Yang Liu (刘洋), Hua Wu (吴华), Kang Liu (刘康), Wanxiang Che (车万翔), Shizhu He (何世柱), Gaoqi Rao (饶高琦)
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
386–397
Language:
Chinese
URL:
https://aclanthology.org/2021.ccl-1.36
DOI:
Bibkey:
Cite (ACL):
Hua Zheng, Yaqi Yan, Yue Wang, Damai Dai, and Yang Liu. 2021. 基于词信息嵌入的汉语构词结构识别研究(Chinese Word-Formation Prediction based on Representations of Word-Related Features). In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 386–397, Huhhot, China. Chinese Information Processing Society of China.
Cite (Informal):
基于词信息嵌入的汉语构词结构识别研究(Chinese Word-Formation Prediction based on Representations of Word-Related Features) (Zheng et al., CCL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/2021.ccl-1.36.pdf