# Complexity Controlled MT

This repository contains implementations for

Sweta Agrawal and Marine Carpuat, "Controlling Text Complexity in Neural Machine Translation", To appear at EMNLP-IJCNLP 2019.

## Usage Instructions
1. Run bash [setup.sh](setup.sh) to obtain necessary software.
2. You can request the Newsela dataset and the OPUS corpus from [here](https://newsela.com/data/) and [here](http://opus.nlpl.eu/) respectively.
3. Use PrepareDataFiles.ipynb to extract alignments and create dev/train/test split and put it under data/.
4. For Training
```bash
Monolingual English Simplification 
> bash scripts/main.sh -d -i <iter>

Multitask model trained on English Simplification and out-of-domain bilingual data
> bash scripts/main.sh -d -b -i <iter>

Other options:
> bash scripts/main.sh -h
```
5. Evaluation: The model is evaluated on all the tasks by default. 

```bash
> bash scripts/evaluate.sh -i <iter>
```

The scripts to calculate SARI score and ARI index were referred from [here](https://github.com/cocoxu/simplification) and [here](https://github.com/mmautner/readability) respectively.

Note: Please refer to the supplementary material of our paper (Table 10 and 11) for exact statistics of the dataset used for training. 
