Humaid Ali Alblooshi
2025
Uncertainty-driven Partial Diacritization for Arabic Text
Humaid Ali Alblooshi
|
Artem Shelmanov
|
Hanan Aldarmaki
Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025)
We present an uncertainty-based approach to Partial Diacritization (PD) for Arabic text. We evaluate three uncertainty metrics for this task: Softmax Response, BALD via MC-dropout, and Mahalanobis Distance. We further introduce a lightweight Confident Error Regularizer to improve model calibration. Our preliminary exploration illustrates possible ways to use uncertainty estimation for selectively retaining or discarding diacritics in Arabic text with an analysis of performance in terms of correlation with diacritic error rates. For instance, the model can be used to detect words with high diacritic error rates which tend to have higher uncertainty scores at inference time. On the Tashkeela dataset, the method maintains low Diacritic Error Rate while reducing the amount of visible diacritics on the text by up to 50% with thresholding-based retention.