Yi Wang


2022

pdf bib
DoTAT: A Domain-oriented Text Annotation Tool
Yupian Lin | Tong Ruan | Ming Liang | Tingting Cai | Wen Du | Yi Wang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We propose DoTAT, a domain-oriented text annotation tool. The tool designs and implements functions heavily in need in domain-oriented information extraction. Firstly, the tool supports a multi-person collaborative process with automatically merging and review, which can greatly improve the annotation accuracy. Secondly, the tool provides annotation of events, nested event and nested entity, which are frequently required in domain-related text structuring tasks. Finally, DoTAT provides visual annotation specification definition, automatic batch annotation and iterative annotation to improve annotation efficiency. Experiments on the ACE2005 dataset show that DoTAT can reduce the event annotation time by 19.7% compared with existing annotation tools. The accuracy without review is 84.09%, 1.35% higher than Brat and 2.59% higher than Webanno. The accuracy of DoTAT even reaches 93.76% with review. The demonstration video can be accessed from https://ecust-nlp-docker.oss-cn-shanghai.aliyuncs.com/dotat_demo.mp4. A live demo website is available at https://github.com/FXLP/MarkTool.

2020

pdf
Chinese Grammatical Error Correction Based on Hybrid Models with Data Augmentation
Yi Wang | Ruibin Yuan | Yan‘gen Luo | Yufang Qin | NianYong Zhu | Peng Cheng | Lihuan Wang
Proceedings of the 6th Workshop on Natural Language Processing Techniques for Educational Applications

A better Chinese Grammatical Error Diagnosis (CGED) system for automatic Grammatical Error Correction (GEC) can benefit foreign Chinese learners and lower Chinese learning barriers. In this paper, we introduce our solution to the CGED2020 Shared Task Grammatical Error Correction in detail. The task aims to detect and correct grammatical errors that occur in essays written by foreign Chinese learners. Our solution combined data augmentation methods, spelling check methods, and generative grammatical correction methods, and achieved the best recall score in the Top 1 Correction track. Our final result ranked fourth among the participants.

2015

pdf
Chinese Word Segmentation based on analogy and majority voting
Zongrong Zheng | Yi Wang | Yves Lepage
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters