# "Extract and Generate Multi-way Aligned Corpus for Complete Multi-lingual Neural Machine Translation"

This repository is the official implementation of my paper. 

## Requirements

To install requirements:

```setup
pip install -r requirements.txt
```

## Generating non-English corpus (take ar-zh in Opus-100 as an example)
- extract candidate aligned examples
  sh ./scripts/extraction/search_all.sh 
  python ./scripts/extraction/extract_para_acc_source_len.py $path_to_ar_zh_dir

- generate final semantic-aligned examples
  sh ./scripts/generation/noisy_enzh2zh.sh 
  sh ./scripts/generation/train_generation_model_for_enzh2zh.sh

  sh ./scripts/generation/process_data_for_decoding.sh
  sh ./scripts/generation/decoding_new.sh

## Training the standard MNMT model on opus-100
  sh ./scripts/standard-MNMT/pre-process.sh
  sh ./scripts/standard-MNMT/start_train.sh

## evaluate the standard MNMT model on opus-100
  sh ./scripts/standard-MNMT/decoding_new.sh

## training the C-MNMT model on opus-100
  sh ./scripts/C-MNMT/pre-process.sh 
  sh ./scripts/C-MNMT/start_train.sh

## evaluate the C-MNMT model on opus-100
  sh ./scripts/C-MNMT/decoding_new.sh
