# Cross-Modal Similarity-Based Curriculum Learning for Image Captioning

This file explains how to train the captioning model with similarity-based curriculum learning. Our codes are largely based on the [self-critical.pytorch](https://github.com/ruotianluo/self-critical.pytorch), so more details can be found in the repository.

## Requirements

- Python3
- PyTorch 1.3+
- cider
- coco-caption
- yacs
- lmdbdict

## Install & Data Download

1. Download the Simi_CL file (Maybe already downloaded when you see this file)

2. Clone the self-critical.pytorch pytorch from the link (provided above)

   ```
   git clone https://github.com/ruotianluo/self-critical.pytorch.git
   ```

3. Creat environment and install required packages as:

   ```
   conda env create -f Simi_CL/simi_cl.yml
   conda activate env_name
   ```

   ```
   pip install Simi_CL/requirements.txt
   ```

4. Copy files from Simi_CL to replace same name files in self-critical.pytorch as:

   ```
   cp Simi_CL/all_dm_files self-critical.pytorch/
   cp Simi_CL/best-configs self-critical.pytorch/
   cp Simi_CL/captioning/data self-critical.pytorch/captioning/
   cp Simi_CL/captioning/utils self-critical.pytorch/captioning/
   cp Simi_CL/tools self-critical.pytorch/
   ```

5. Download the COCO and Flickr30k captions and preprocess them according to the [data preprocessing method](https://github.com/ruotianluo/self-critical.pytorch/blob/master/data/README.md) in the self-critical.pytorch. Make sure to save four files at correct paths as: 
   `data/cocotalk.json`,`data/cocotalk_label.h5`,`data/dataset_flickr30k.json`,`data/f30ktalk_label.h5`.

7. Download the extracted image features from the [shared link](https://drive.google.com/drive/folders/1eCdz62FAVCGogOuNhy87Nmlo5_I0sH2J) provided in the self-critical.pytorch. Since we are using the Bottom-Up features, so for COCO please download the `cocobu_att.tar` and unzip it at `data/cocobu_att`, and for Flickr please download the `f30kbu_att.pth` and save at `data/f30kbu_att.pth`.

## Train & Eval Model
Execute codes below at path `self-critical.pytorch/`

1. Train the vanilla model with `train.py` like:

   ```
   python tools/train.py --cfg best-configs/coco/transformer_base_coco.yml --id $model_id$
   ```

2. Train the vanilla model with CL method with `train_cl.py` like:

   ```
   python tools/train_cl.py --cfg best-configs/coco/transformer_simi_cl_clip.yml --id $model_id$
   ```

   and model will be save at `all_ckpts/`.

3. Evaluate model with (coco with bs=3/5 and Flickr with bs=3):

   ```
   python tools/eval.py --dump_images 0 --num_images -1 --model all_ckpts/log_$model_id$/model-best.pth --infos_path all_ckpts/log_$model_id$/infos_$model_id$-best.pkl --language_eval 1 --beam_size 5 --batch_size 100
   ```
   and eval files will be save at `eval_results/`.