# Deep-Reinforcement-Learning-Visual-Dialog
Research Code for Deep Reinforcement Learning for visual dialog research

## Setup and Dependencies
The code is implemented in Pytorch(v1.0). We have created a conda enviroment file for you to easily setup the environment to run this code. To install conda please follow the instruction here: https://conda.io/projects/conda/en/latest/user-guide/install/index.html

Once you installed conda, go to this repository and run the following command:
```
conda env create -f multimodal_dialog.yml
```
You will then create a conda virtual environment named __multimodal_dialog__. Before you run the code, you can activate this virtual environment with:
```
source activate multimodal_dialog
```
In order to use GPU you will also need to install 
[CUDA 9.0](https://developer.nvidia.com/cuda-90-download-archive)

## Usage
Preprocessed data should be stored under `data/`. The visdial preprocessed data is stored under `data/visdial` and the fasion preprocessed data is stored under `data/fashion`.

### Training
If you want to train the original supervised learning Q-Bot, run the following command:
```
python main.py -useGPU -trainMode sl-qbot -savePath your/path/to/save/trained_sl_qBot
```
As the data structure between the visdial dataset and fashion dataset is different, we have a separate script to train model on fashion data:
```
python main_fashion.py -useGPU -trainMode sl-qbot -inputQues data/fashion/caht_processed_data.h5 \
-inputJson data/fashion/chat_processed_params.json \
-inputImg data/fashion/data_img.h5 \
-savePath your/path/to/save/trained_sl_qBot
```
To initialize the training on fashion data with a pretrained model on Visdial dataset:
```
python main_fashion.py -useGPU -trainMode sl-qbot -inputQues data/fashion/caht_processed_data.h5 \
-inputJson data/fashion/chat_processed_params.json \
-inputImg data/fashion/data_img.h5 \
-savePath your/path/to/save/trained_sl_qBot \
-pretrainedVisdialModel \
-qstartFrom your/path/to/pretrained_model.vd
```
To train an answer bot with supervised learning:
```
python main.py -useGPU -trainMode sl-abot -savePath your/path/to/save/trained_sl_aBot
```
For RL fine-tuning, you will have a pretrained question bot and answer bot. (For Now, the RL training on Fashion data is still not applicable)
```
python train.py -useGPU \ 
-trainMode rl-full-QAf \
-startFrom your/path/to/trained_sl_aBot_model.vd \
-qstartFrom your/path/to/trained_sl_qBot_model.vd \
-savePath your/path/to/save/trained_rl_qaBots
```
### Logging
Following the original [visdial-rl](https://github.com/batra-mlp-lab/visdial-rl) repository, we use visdom as our logging metrics to track trianing loss, validation loss, winning_rates of the 20-image game. By default, visdom logging is disabled. To activate the logging, activate the flag `-enableVisdom` and specify other visdom server settings including `-visdomServerPort`,`-visdomServer`, and `-visdomEnv` etc. For details please refer to `options.py`. To launch a visdom server:
```
python -m visdom.server -p <port>
```
Then you can navigate to `localhost:<port>` to track all plots of the logging values.

## Dataset
The [coco.json](http://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip) mentioned in Dhruv Batra's [visdial_rl](https://github.com/batra-mlp-lab/visdial-rl) is in the format of datasplit of coco from Andrej Kaparthy. 

## Acknowledge
This code is built on top of the visdial-rl repository. Please cite the following repo:
```
@misc{modhe2018visdialrlpytorch
   author = {Modhe, Nirbhay and Prabhu, Viraj and Cogswell, Michael and Kottur, Satwik and Das, Abhishek and Lee, Stefan and Parikh, Devi and Batra, Dhruv },
   title = {VisDial-RL-PyTorch},
   year = {2018},
   publisher = {GitHub}.
   journal = {GitHub repository},
   howpublished = {\url{https://github.com/batra-mlp-lab/visdial-rl.git}}
}

@inproceedings{das2017visdialrl,
  title={Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning},
  author={Abhishek Das and Satwik Kottur and Jos\'e M.F. Moura and
    Stefan Lee and Dhruv Batra},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
  year={2017}
}
```
## TODO
1. Accomplish the Evaluation Code on Fashion Data
2. Write the documentation on how to run evaluation and generate dialogs. 
