数据标注方法比较研究:以依存句法树标注为例(Comparison Study on Data Annotation Approaches: Dependency Tree Annotation as Case Study)
Mingyue Zhou (周明月), Chen Gong (龚晨), Zhenghua Li (李正华), Min Zhang (张民)
Abstract
数据标注最重要的考虑因素是数据的质量和标注代价。我们调研发现自然语言处理领域的数据标注工作通常采用机标人校的标注方法以降低代价;同时,很少有工作严格对比不同标注方法,以探讨标注方法对标注质量和代价的影响。该文借助一个成熟的标注团队,以依存句法数据标注为案例,实验对比了机标人校、双人独立标注、及本文通过融合前两种方法所新提出的人机独立标注方法,得到了一些初步的结论。- Anthology ID:
- 2021.ccl-1.48
- Volume:
- Proceedings of the 20th Chinese National Conference on Computational Linguistics
- Month:
- August
- Year:
- 2021
- Address:
- Huhhot, China
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 525–536
- Language:
- Chinese
- URL:
- https://aclanthology.org/2021.ccl-1.48
- DOI:
- Cite (ACL):
- Mingyue Zhou, Chen Gong, Zhenghua Li, and Min Zhang. 2021. 数据标注方法比较研究:以依存句法树标注为例(Comparison Study on Data Annotation Approaches: Dependency Tree Annotation as Case Study). In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 525–536, Huhhot, China. Chinese Information Processing Society of China.
- Cite (Informal):
- 数据标注方法比较研究:以依存句法树标注为例(Comparison Study on Data Annotation Approaches: Dependency Tree Annotation as Case Study) (Zhou et al., CCL 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.ccl-1.48.pdf
- Data
- Penn Treebank