# CoV-RAG
our paper: Retrieving, Rethinking and Revising: The Chain-of-Verification Can Improve Retrieval Augmented Generation (CoV-RAG)
## Table of Contents
- [Training](#training)
- [Inference](#inference)
- [Evaluation](#evaluation)
# Updates
- 2024.01: Initial release

## Training
This section describes how to train the model.
### Prerequisites
List any requirements that must be met before running the training scripts, such as installed libraries, specific hardware requirements, etc.
### Training Steps
#### 1. Data Processing
Use the `trans_to_vicuna_format.py` script to convert raw data into a format suitable for training, raw data in ./data.
```bash
python train/data_processing.py --input_file ./data/ --output_file ./data/train.jsonl
```
#### 2. Start Training
Run the `train.sh` script to begin the training process.
```bash
sh train.sh
```
## Inference
This section explains how to deploy the model and perform inference.
### Model Service Deployment
- Controller Startup
We use FASTCHAT
First, use the `start_controller.sh` script to start the controller.
```bash
sh start_controller.sh
```
- Worker Startup

After starting the controller, use the start_worker.sh script to start the worker.
```bash
sh start_worker.sh
```
### Inference Process
The inference includes the following stages:
- **Retriever**: Use the `retrieve.sh` script for the initial retrieval.
```bash
sh retrieve.sh
```
- **Generator**: Based on the retrieval results, run `generator.sh` to generate answers.
```bash
sh generator.sh
```
- **Verifier**: Use the `verification.sh` script to verify the generated answers, and repeat the retrieval if necessary.
```bash
sh verification.sh
```
After verification, restart the retrieval process if needed.

## Evaluation
This section describes how to use the evaluation script to assess the performance of the model.
Running the Evaluation Script
Use the `evaluate.sh` script for model evaluation.
```bash
sh evaluate.sh
```
## FAQ
sample!
### Q1: The retrieve.sh script is taking too long to execute. Is this normal?

**A:** The execution time for `retrieve.sh` can vary based on the size of the query set and system performance. If it's taking exceptionally long, consider checking system resource usage and ensure that your machine meets the recommended hardware specifications.
### Q2: What is the expected format for the input data?

**A:** The expected format for the input data depends on the specifics of your model and scripts. Generally, it's a structured format like CSV, JSON, or a specific format required by your model. Refer to the `trans_to_vicuna_format.py` script documentation for more details.

