# On the Language Neutrality of Pre-trained Multilingual Representations

This package contains scripts for experiments from paper On the Language
Neutrality of Pre-trained Multilingual Representations.

## Prerequisities

Python3 with packages listed in `requirements.txt` is required. The
representation models are downloaded autoamtically via the Transformers
package. The Udify model can be downlaoded from
http://hdl.handle.net/11234/1-3042. Static word embeddings can be downloaded
with the `get_word_embeddings.sh` script.

## Data

Data for language identification used also for language similarity
visualization and for adversarial mBERT finetuning are attached to the
submission.

The WTM14 test sets used for sentence retrieval can be downloaded using
SacreBLEU (https://github.com/mjpost/sacreBLEU) by using the `--echo` options
for particular dataset.

The data for word aligmnment can be downloaded from the following address.

* English-Czech: http://hdl.handle.net/11234/1-1804

* English-Swedish: http://hdl.handle.net/11372/LRT-1517

* English-German: https://www-i6.informatik.rwth-aachen.de/goldAlignment

* English-French: http://web.eecs.umich.edu/~mihalcea/wpt/data/English-French.test.tar.gz

* English-Romanian: http://web.eecs.umich.edu/~mihalcea/wpt/data/Romanian-English.test.tar.gz

The data for the MT Quality Estimation task can be downloaded at
http://www.statmt.org/wmt19/qe-task.html.

## Probing tasks

Individual probing tasks are implemented in Python scripts which are named
accordingly. Follow the help of the scripts to run experiments.

