CroAno : A Crowd Annotation Platform for Improving Label Consistency of Chinese NER Dataset

Baoli Zhang, Zhucong Li, Zhen Gan, Yubo Chen, Jing Wan, Kang Liu, Jun Zhao, Shengping Liu, Yafei Shi


Abstract
In this paper, we introduce CroAno, a web-based crowd annotation platform for the Chinese named entity recognition (NER). Besides some basic features for crowd annotation like fast tagging and data management, CroAno provides a systematic solution for improving label consistency of Chinese NER dataset. 1) Disagreement Adjudicator: CroAno uses a multi-dimensional highlight mode to visualize instance-level inconsistent entities and makes the revision process user-friendly. 2) Inconsistency Detector: CroAno employs a detector to locate corpus-level label inconsistency and provides users an interface to correct inconsistent entities in batches. 3) Prediction Error Analyzer: We deconstruct the entity prediction error of the model to six fine-grained entity error types. Users can employ this error system to detect corpus-level inconsistency from a model perspective. To validate the effectiveness of our platform, we use CroAno to revise two public datasets. In the two revised datasets, we get an improvement of +1.96% and +2.57% F1 respectively in model performance.
Anthology ID:
2021.emnlp-demo.32
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Heike Adel, Shuming Shi
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
275–282
Language:
URL:
https://aclanthology.org/2021.emnlp-demo.32
DOI:
10.18653/v1/2021.emnlp-demo.32
Bibkey:
Cite (ACL):
Baoli Zhang, Zhucong Li, Zhen Gan, Yubo Chen, Jing Wan, Kang Liu, Jun Zhao, Shengping Liu, and Yafei Shi. 2021. CroAno : A Crowd Annotation Platform for Improving Label Consistency of Chinese NER Dataset. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 275–282, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
CroAno : A Crowd Annotation Platform for Improving Label Consistency of Chinese NER Dataset (Zhang et al., EMNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2021.emnlp-demo.32.pdf
Video:
 https://preview.aclanthology.org/dois-2013-emnlp/2021.emnlp-demo.32.mp4