基于深加工语料库的《唐诗三百首》难度分级(The difficulty classification of ‘ Three Hundred Tang Poems ’ based on the deep processing corpus)
Yuyu Huang (黄宇宇), Xinyu Chen (陈欣雨), Minxuan Feng (冯敏萱), Yunuo Wang (王禹诺), Beiyuan Wang (蓓原王,), Bin Li (李斌)
Abstract
“为辅助中小学教材及读本中唐诗的选取,本文基于对《唐诗三百首》分词、词性、典故标记的深加工语料库,据诗句可读性创新性地构建了分级标准,共分4层,共计8项可量化指标:字层(通假字)、词层(双字词)、句层(特殊句式、标题长度、诗句长度)、艺术层(典故、其他修辞、描写手法)。据以上8项指标对语料库中313首诗评分,建立基于量化特征的向量空间模型,以K-means聚类算法将诗歌聚类以对应小学、初中和高中3个学段的唐诗学习。”- Anthology ID:
- 2023.ccl-1.43
- Volume:
- Proceedings of the 22nd Chinese National Conference on Computational Linguistics
- Month:
- August
- Year:
- 2023
- Address:
- Harbin, China
- Editors:
- Maosong Sun, Bing Qin, Xipeng Qiu, Jing Jiang, Xianpei Han
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 491–500
- Language:
- Chinese
- URL:
- https://aclanthology.org/2023.ccl-1.43
- DOI:
- Cite (ACL):
- Yuyu Huang, Xinyu Chen, Minxuan Feng, Yunuo Wang, Beiyuan Wang, and Bin Li. 2023. 基于深加工语料库的《唐诗三百首》难度分级(The difficulty classification of ‘ Three Hundred Tang Poems ’ based on the deep processing corpus). In Proceedings of the 22nd Chinese National Conference on Computational Linguistics, pages 491–500, Harbin, China. Chinese Information Processing Society of China.
- Cite (Informal):
- 基于深加工语料库的《唐诗三百首》难度分级(The difficulty classification of ‘ Three Hundred Tang Poems ’ based on the deep processing corpus) (Huang et al., CCL 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2023.ccl-1.43.pdf