# Uncertainty Estimation in Transformers

This directory contain code for training and evaluating the uncertainty scores of the models.

The uncertainty score we evaluate is **Bayesian
Active Learning by Disagreement (BALD)**
[(Houlsby et al., 2011)](https://arxiv.org/abs/1112.5745). This metric of uncertainty assigns scores to data points according to the extent to which their labels would enhance our understanding of the actual distribution of model parameters.

All models are trained and UE scores of full models and adapter models are then evaluated by comparing:
- **Wasserstein distance (WD)**, also known as
the earth mover distance,
shows how much “work” needs to be applied
to transform one probability distribution into another. It can be assumed that a low numerical value of WD means that two distrubutions are similar.
- **Kullback–Leibler (KL) divergence** is a general measure of how different one probability distribution is in reference to another. A low value of KL divergence means the two distributions are identical in the context of the information they convey.

---

Refer to `analyze_results.ipynb` for the resulting calculations.

The training and evaluation scripts can be found in `scripts/ue_scripts`.

The code is based on the [uncertainty_transformers](https://github.com/AIRI-Institute/uncertainty_transformers) framework, which offers uncertainty estimation methods for NLP tasks, including classification.