## Multilingual MRC Model

### Introduction
The multilingual MRC model is based on [mBERT](https://github.com/google-research/bert/blob/master/multilingual.md)  or 
[XLM-100](https://huggingface.co/xlm-mlm-100-1280/tree/main) equipped with a Siamese Semantic Disentanglement Model (SSDM) 
to explicitly transfer only semantic knowledge to the target language.

### Environment
- GPU       Quadro RTX 6000  24G
- python    3.7.9
- torch     1.7.1
- cuda      11.0

### Usage

---
#### Training SSDM
##### mBERT
We have two types of SSDM models. The average runtime for each model about one hour.
Set the corresponding parameters of loss and hyperparameters in the `config_ssdm.py` file and run the corresponding training code.
        
    python train_ssdm_POS.py --save_prefix "ssdm_output/mbert_ssdm_pos" \
    --train_file "data/SSDM_data/train_data/parallel_sentence.txt" \
    --eval_file ['data/SSDM_data/zh_sts_data/dev.csv', 'data/SSDM_data/zh_sts_data/test.csv'] \
    --ml_token "bert-base-multilingual-cased" \
    --ml_model_path "bert-base-multilingual-cased" \
    --lr 5e-5 \
    --batch_size 20 \
    --n_epoch 2 \
    --ysize 200 \
    --zsize 200

or
    
    python train_ssdm_SP.py --save_prefix "ssdm_output/mbert_ssdm_sp" \
    --train_file "data/SSDM_data/train_data/parallel_sentence.txt" \
    --eval_file ['data/SSDM_data/zh_sts_data/dev.csv', 'data/SSDM_data/zh_sts_data/test.csv'] \
    --ml_token "bert-base-multilingual-cased" \
    --ml_model_path "bert-base-multilingual-cased" \
    --lr 5e-5 \
    --batch_size 100 \
    --n_epoch 2 \
    --ysize 200 \
    --zsize 200
    
##### XLM-100
We have two types of SSDM models. The average runtime for each model about one and a half hours.
Set the corresponding parameters of losses and hyperparameters in the `config_ssdm_xlm.py` file and run the corresponding training code.
        
    python train_ssdm_xlm_POS.py --save_prefix "ssdm_output/mbert_ssdm_xlm_pos" \
    --train_file "data/SSDM_data/train_data/parallel_sentence.txt" \
    --eval_file ['data/SSDM_data/zh_sts_data/dev.csv', 'data/SSDM_data/zh_sts_data/test.csv'] \
    --ml_token "xlm-mlm-100-1280" \
    --ml_model_path "xlm-mlm-100-1280" \
    --lr 5e-5 \
    --batch_size 16 \
    --n_epoch 2 \
    --ysize 200 \
    --zsize 200

or
    
    python train_ssdm_xlm_SP.py --save_prefix "ssdm_output/mbert_ssdm_xlm_sp" \
    --train_file "data/SSDM_data/train_data/parallel_sentence.txt" \
    --eval_file ['data/SSDM_data/zh_sts_data/dev.csv', 'data/SSDM_data/zh_sts_data/test.csv'] \
    --ml_token "xlm-mlm-100-1280" \
    --ml_model_path "xlm-mlm-100-1280" \
    --lr 5e-5 \
    --batch_size 16 \
    --n_epoch 2 \
    --ysize 200 \
    --zsize 200

#### Fine-tuning MRC model
##### mBERT
The average runtime for each model about two and a half hours.

    python run_mrc_ssdm_POS.py \
    --model_type bert \
    --model_name_or_path "bert-base-multilingual-cased" \
    --semantic_model_dir "ssdm_output/mbert_ssdm_pos/best.ckpt"
    --do_train \
    --do_eval \
    --train_file data/MRC_data/data_squad/train-v1.1.json \
    --predict_file data/MRC_data/data_squad/dev-v1.1.json \
    --learning_rate 2e-5 \
    --num_train_epochs 3 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --output_dir mrc_output/mrc_ssdm_pos/ \
    --per_gpu_eval_batch_size=16   \
    --per_gpu_train_batch_size=16   \
    --gradient_accumulation_steps=2

or

    python run_mrc_ssdm_SP.py \
    --model_type bert \
    --model_name_or_path "bert-base-multilingual-cased" \
    --semantic_model_dir "ssdm_output/mbert_ssdm_sp/best.ckpt"
    --do_train \
    --do_eval \
    --train_file data/MRC_data/data_squad/train-v1.1.json \
    --predict_file data/MRC_data/data_squad/dev-v1.1.json \
    --learning_rate 2e-5 \
    --num_train_epochs 3 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --output_dir mrc_output/mrc_ssdm_sp/ \
    --per_gpu_eval_batch_size=16   \
    --per_gpu_train_batch_size=16   \
    --gradient_accumulation_steps=2

##### XLM-100
The average runtime for each model about six and a half hours.

    python run_mrc_xlm_ssdm_POS.py \
    --model_type xlm \
    --model_name_or_path "xlm-mlm-100-1280" \
    --semantic_model_dir "ssdm_output/mbert_ssdm_xlm_pos/best.ckpt"
    --do_train \
    --do_eval \
    --train_file data/MRC_data/data_squad/train-v1.1.json \
    --predict_file data/MRC_data/data_squad/dev-v1.1.json \
    --learning_rate 3e-5 \
    --num_train_epochs 2 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --output_dir mrc_output/mrc_ssdm_xlm_pos/ \
    --per_gpu_eval_batch_size=8   \
    --per_gpu_train_batch_size=8   \
    --gradient_accumulation_steps=2

or

    python run_mrc_ssdm_SP.py \
    --model_type xlm \
    --model_name_or_path "xlm-mlm-100-1280" \
    --semantic_model_dir "ssdm_output/mbert_ssdm_xlm_sp/best.ckpt"
    --do_train \
    --do_eval \
    --train_file data/MRC_data/data_squad/train-v1.1.json \
    --predict_file data/MRC_data/data_squad/dev-v1.1.json \
    --learning_rate 3e-5 \
    --num_train_epochs 3 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --output_dir mrc_output/mrc_ssdm_xlm_sp/ \
    --per_gpu_eval_batch_size=8   \
    --per_gpu_train_batch_size=8   \
    --gradient_accumulation_steps=2
    
#### Evaluating MRC model in Three datasets
The three dataset are [XQuAD](https://doi.org/10.18653/v1/2020.acl-main.421), 
[MLQA](https://doi.org/10.18653/v1/2020.acl-main.653), [TyDiQA](https://transacl.org/ojs/index.php/tacl/article/view/1929).
##### mBERT
    python eval_mrc_pos.py \
    --model_type bert \
    --model_name_or_path "bert-base-multilingual-cased" \
    --semantic_model_dir "ssdm_output/mbert_ssdm_pos/best.ckpt"
    --do_eval \
    --output_dir mrc_output/mrc_ssdm_pos/ \
    --per_gpu_eval_batch_size=32   \

or

    python eval_mrc_sp.py \
    --model_type bert \
    --model_name_or_path "bert-base-multilingual-cased" \
    --semantic_model_dir "ssdm_output/mbert_ssdm_sp/best.ckpt"
    --do_eval \
    --output_dir mrc_output/mrc_ssdm_sp/ \
    --per_gpu_eval_batch_size=32   \

##### XLM-100
    python eval_mrc_pos.py \
    --model_type xlm \
    --model_name_or_path "xlm-mlm-100-1280" \
    --semantic_model_dir "ssdm_output/mbert_ssdm_xlm_pos/best.ckpt"
    --do_eval \
    --output_dir mrc_output/mrc_ssdm_xlm_pos/ \
    --per_gpu_eval_batch_size=16   

or

    python eval_mrc_sp.py \
    --model_type xlm \
    --model_name_or_path "xlm-mlm-100-1280" \
    --semantic_model_dir "ssdm_output/mbert_ssdm_xlm_sp/best.ckpt"
    --do_eval \
    --output_dir mrc_output/mrc_ssdm_xlm_sp/ \
    --per_gpu_eval_batch_size=16  


### Results
Our baseline is mBERT and XLM-100. We use the same evaluation metrics in the [SQuAD v1.1](https://rajpurkar.github.io/SQuAD-explorer/), i.e., F1 and Exact Match (EM). EM Score measures the percentage of predictions that exactly match any one of the ground truths. F1 score is used to measure the answer overlap between predictions and ground truth.

![img.png](results.png)
