# SimLLM: Detecting Sentences Generated by Large Language Models Using Similarity between the Generation and its Re-Generation

## Requirements

1. **API Keys**
   - Obtain API keys for the corresponding models and insert them into the `SimLLM.py` file:
     - ChatGPT: [OpenAI API](https://openai.com/index/openai-api/)
     - Gemini: [Google Gemini API](https://ai.google.dev/gemini-api/docs/api-key)
     - Other LLMs: [Together API](https://api.together.ai/)

2. **Dependencies**
   - Install the required packages:
     `pip install -r requirements.txt`

## Usage

To run the script, use the following command:

`python SimLLM.py`

### Parameters

- `LLMs`: List of large language models to use. Available models include 'ChatGPT', 'Yi', 'OpenChat', 'Gemini', 'LLaMa', 'Phi', 'Mixtral', 'QWen', 'OLMO', 'WizardLM', and 'Vicuna'. Default is `['ChatGPT', 'Yi', 'OpenChat']`.
- `train_indexes`: List of LLM indexes for training. Default is `[0, 1, 2]`.
- `test_indexes`: List of LLM indexes for testing. Default is `[0]`.
- `num_samples`: Number of samples. Default is 5000.

### Examples

- Running with default parameters:
  `python SimLLM.py`

- Running with customized parameters:
  `python SimLLM.py --LLMs ChatGPT --train_indexes 0 --test_indexes 0`

## Dataset

The `dataset.csv` file contains both human and generated texts from 12 large language models, including:
ChatGPT, GPT-4o, Yi, OpenChat, Gemini, LLaMa, Phi, Mixtral, QWen, OLMO, WizardLM, and Vicuna.

### Acknowledgements

- BARTScore: [BARTScore GitHub Repository](https://github.com/neulab/BARTScore)
