Abstract
本文归纳了问句形式在问句语料筛选中的作用,探索了问句分类必需的形式特征,同时通过人工标注建设了中文问句分类语料库,并在此基础上进行了基于规则和统计的分类实验,通过多轮实验迭代优化特征组合形成特征规则集,为当前问答提供形式上的分类基础。实验中,基于优化特征规则集的有限状态自动机可实现宏平均F1值为0.94;统计机器学习中随机森林模型的分类效果较好,F1值宏平均达到0.98,表明问句形式分类具有相当可行性和准确性。- Anthology ID:
- 2020.ccl-1.11
- Volume:
- Proceedings of the 19th Chinese National Conference on Computational Linguistics
- Month:
- October
- Year:
- 2020
- Address:
- Haikou, China
- Editors:
- Maosong Sun (孙茂松), Sujian Li (李素建), Yue Zhang (张岳), Yang Liu (刘洋)
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 107–116
- Language:
- Chinese
- URL:
- https://aclanthology.org/2020.ccl-1.11
- DOI:
- Cite (ACL):
- Jiangtao Li and Gaoqi Rao. 2020. 中文问句的形式分类和资源建设(Formal classification and resource construction of Chinese questions). In Proceedings of the 19th Chinese National Conference on Computational Linguistics, pages 107–116, Haikou, China. Chinese Information Processing Society of China.
- Cite (Informal):
- 中文问句的形式分类和资源建设(Formal classification and resource construction of Chinese questions) (Li & Rao, CCL 2020)
- PDF:
- https://preview.aclanthology.org/landing_page/2020.ccl-1.11.pdf