# CoREN
CoREN consists of 3 steps: (1) Reward Generation (2) Reward Ensemble training (3) Offline RL training
The codes are classified for each of the three steps.

## Set up environment
Install environment using environment configure file:
```
conda env create --file COREN.yaml
```
You need to install the Virtualhome UnitySimulator suitable for your environment. Please refer to the following page.:  
https://github.com/xavierpuigf/virtualhome
Download the VirtualHome UnitySimulator executable and move it under `RL_Agent/Virtualhome/unity_simulator/unity_simulator`

## Environment Setting
```
cd RL_Agent/Virtualhome
python3 -m pip install -e .
```

## Reward generation
```
cd Reward_Generation
```
You can run the following line to create reward sets using various prompts. Choose a prompt type from [naive, icl, cot].
```
python3 caption_reward.py
--prompt {PROMPT_TYPE}
```

To generate a reward using RAG, run the following line. You can choose a prompt type from [rag_robot, rag_general].
'rag_robot' is an interactive robot prompt, and 'rag_general' is general prompt.  
The results are saved in `Reward_Generation/Data`.
```
python3 rag.py
--prompt {PROMPT_TYPE} [rag_robot, rag_general]
```

And then run the following line.
```
python3 caption_rag.py
--prompt {PROMPT_TYPE} [rag_robot, rag_general]
```

You can process the generated reward through the processing.py file. Processing can be applied to [naive, icl, cot, rag_robot, rag_general] prompts
```
python3 processing.py
--prompt {PROMPT_TYPE}
```  

To create a reward set in the training format, run the following line. PROMPT_TYPE is the file name you want to convert.
```
python3 merge.py
--prompt {PROMPT_TYPE}
```

# Generate Spatial-Temporal Rewards
You can run the following lines to create each of the TCS consistent rewards, and you can ensemble them.  
- [PROMPT_TYPE]: The name of the reward set you want to make consistent.
- [CONSISTENT_METHOD]: Select one of the options [temporal, contextual, structural] for the consistency method.  
- [ENSEMBLE_NAME]: This is used to ensemble the results of multiple prompts. Specify a name for the ensemble.

```
cd Reward_Generation
python3 ensemble.py
--prompt {PROMPT_TYPE}
--method {CONSISTENT_METHOD}
--ensemble {ENSEMBLE_NAME}
```

  
All files created in Reward generation are stored in `Reward_Generation/output.`

# Reward Ensemble training
```
cd Reward_Ensemble
```
You can train the reward by running the following line.

```
python3 train.py

```
You can evaluate the reward by running the following line.

```
python3 eval.py
```

# RL Agent training
```
cd Reward_Ensemble
```
You can train the RL agent by running the following line.
```
python3 train.py
```

# COREN Evaluation

The next line can be run in the environment with the model trained above.

```
python3 Coren_planning.py
```

