## Multi-teacher Knowledge Distillation for CTC/Att models
This is the implementation of multi-teacher distillation methods to
joint ctc-attention end-to-end ASR systems. The proposed approaches integrate
the error rate metric to the teacher selection rather than solely focusing on the observed losses.
This way, we directly distillate and optimize the student toward the relevant metric for speech recognition.
For details please refer to: https://arxiv.org/abs/2005.09310

### Results with this recipe

| Distillation Strategy | Valid PER | Test PER | Model link | GPUs |
|:---------------------------:| :-----:| :-----:| :-----:| :--------:|
| Weighted | 11.87 | 13.11 | [model](https://drive.google.com/drive/folders/1MHR2AZvCYZr88yUQZTmORCvKJqTsYZAQ?usp=sharing) | 1xV100 16GB |
| Best | 11.93 | 13.15 | [model](https://drive.google.com/drive/folders/1D-3GNh-XzjoU-_6egT3Ns6maCvF-fAJH?usp=sharing) | 1xV100 16GB |

### Extra-Dependencies
Before running this recipe, make sure h5py is installed. Otherwise, run:
pip install h5py

### Training steps
To speed up student distillation from multiple teachers, we separate the whole procedure into three parts: teacher model training, inference running on teacher models, student distillation.

#### 1. Teacher model training
Before doing distillation, we require finishing N teacher models training. Here, we propose to set N=10 as in the referenced paper.

Models training can be done in parallel using `train_teacher.py`.

Example:
```
python train_teacher.py hparams/teachers/tea0.yaml --data_folder /path-to/data_folder
```

#### 2. Run inference on all teacher models
This part run inference on all teacher models and store them on disk using `save_teachers.py`. It is only required that you setup the `tea_models_dir` variable corresponding to the path to a txt file. The latter txt file needs to contain
a list of paths pointing to each teacher model.ckpt. We decided to work with a file so it can easily scale to hundreds of teachers. Hence, an example of this
file is:

```
results/tea0/1234/save/CKPT+2021-01-21+14-50-32+00/model.ckpt
results/tea1/1234/save/CKPT+2021-01-21+13-55-56+00/model.ckpt
results/tea2/1234/save/CKPT+2021-01-21+14-25-21+00/model.ckpt
results/tea3/1234/save/CKPT+2021-01-21+15-02-32+00/model.ckpt
results/tea4/1234/save/CKPT+2021-01-21+15-47-09+00/model.ckpt
results/tea5/1234/save/CKPT+2021-01-21+16-02-38+00/model.ckpt
results/tea6/1234/save/CKPT+2021-01-21+16-05-33+00/model.ckpt
results/tea7/1234/save/CKPT+2021-01-21+16-03-20+00/model.ckpt
results/tea8/1234/save/CKPT+2021-01-21+16-25-17+00/model.ckpt
results/tea9/1234/save/CKPT+2021-01-21+15-48-42+00/model.ckpt
```

Example:
```
python save_teachers.py hparams/save_teachers.yaml --data_folder /path-to/data_folder --tea_models_dir /path-to/tea_model_paths.txt
```

#### 3. Student distillation
This is the main part for distillation using `train_kd.py`. Here, the variable `pretrain` might be used to use a pre-trained teacher as the student. Note that if set to `True`, a path to the corresponding `model.ckpt` must be given in `pretrain_st_dir`. Also, `tea_infer_dir` is required, linking to the directory of teacher model inference results. Finally, note that the distillation must be trained on with the exact same input CSV files that are generated by `save_teachers.py`. This ensure that the distillation is perfectly linked to the
generated teacher predictions! Diverging input CSV files might generate incompatible shape errors!

Example:
```
python train_kd.py hparams/train_kd.yaml --data_folder /path-to/data_folder --pretrain_st_dir /path-to/model_directory --tea_infer_dir /path-to/tea_infer_directory
```

### Distillation strategies
There are three strategies in the current version that can be switched with the option `strategy` in `hparams/train_kd.yaml`.

- **average**: average losses of teachers when doing distillation.
- **best**: choosing the best teacher based on WER.
- **weighted**: assigning weights to teachers based on WER.


# **About SpeechBrain**
- Website: https://speechbrain.github.io/
- Code: https://github.com/speechbrain/speechbrain/
- HuggingFace: https://huggingface.co/speechbrain/


# **Citing SpeechBrain**
Please, cite SpeechBrain if you use it for your research or business.

```bibtex
@misc{speechbrain,
  title={SpeechBrain: A General-Purpose Speech Toolkit},
  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
  year={2021},
  eprint={2106.04624},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}
```