Mitigating Uncertainty in Document Classification

Xuchao Zhang, Fanglan Chen, Chang-Tien Lu, Naren Ramakrishnan


Abstract
The uncertainty measurement of classifiers’ predictions is especially important in applications such as medical diagnoses that need to ensure limited human resources can focus on the most uncertain predictions returned by machine learning models. However, few existing uncertainty models attempt to improve overall prediction accuracy where human resources are involved in the text classification task. In this paper, we propose a novel neural-network-based model that applies a new dropout-entropy method for uncertainty measurement. We also design a metric learning method on feature representations, which can boost the performance of dropout-based uncertainty methods with smaller prediction variance in accurate prediction trials. Extensive experiments on real-world data sets demonstrate that our method can achieve a considerable improvement in overall prediction accuracy compared to existing approaches. In particular, our model improved the accuracy from 0.78 to 0.92 when 30% of the most uncertain predictions were handed over to human experts in “20NewsGroup” data.
Anthology ID:
N19-1316
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Jill Burstein, Christy Doran, Thamar Solorio
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3126–3136
Language:
URL:
https://aclanthology.org/N19-1316
DOI:
10.18653/v1/N19-1316
Bibkey:
Cite (ACL):
Xuchao Zhang, Fanglan Chen, Chang-Tien Lu, and Naren Ramakrishnan. 2019. Mitigating Uncertainty in Document Classification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3126–3136, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Mitigating Uncertainty in Document Classification (Zhang et al., NAACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/N19-1316.pdf
Video:
 https://preview.aclanthology.org/ingest-2024-clasp/N19-1316.mp4
Code
 xuczhang/UncertainDC
Data
IMDb Movie Reviews