Mayank Gupta
2020
Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution
David Q. Sun
|
Hadas Kotek
|
Christopher Klein
|
Mayank Gupta
|
William Li
|
Jason D. Williams
Proceedings of the 28th International Conference on Computational Linguistics
This paper develops and implements a scalable methodology for (a) estimating the noisiness of labels produced by a typical crowdsourcing semantic annotation task, and (b) reducing the resulting error of the labeling process by as much as 20-30% in comparison to other common labeling strategies. Importantly, this new approach to the labeling process, which we name Dynamic Automatic Conflict Resolution (DACR), does not require a ground truth dataset and is instead based on inter-project annotation inconsistencies. This makes DACR not only more accurate but also available to a broad range of labeling tasks. In what follows we present results from a text classification task performed at scale for a commercial personal assistant, and evaluate the inherent ambiguity uncovered by this annotation strategy as compared to other common labeling strategies.
2008
Bengali and Hindi to English CLIR Evaluation
Debasis Mandal
|
Sandipan Dandapat
|
Mayank Gupta
|
Pratyush Banerjee
|
Sudeshna Sarkar
Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies
Search
Co-authors
- Debasis Mandal 1
- Sandipan Dandapat 1
- Pratyush Banerjee 1
- Sudeshna Sarkar 1
- David Q. Sun 1
- show all...