# Transductive Learning for Unsupervised Style Transfer

## Enviroment
* pytorch == 1.2.0
* kenlm
* transformers == 3.5.0
* nltk == 3.4.5
* Elasticsearch

Each experiment is conducted on a Tesla V100 GPU 16G and takes about 10 hours. Ours final results are in the fold `./outputs/{yelp/gyafc}/`

## Dataset
Rename the data file to the following format. Each line in the file is an original sentence.
* Yelp
  * the dataset can be downloaded at https://github.com/fastnlp/style-transformer/tree/master/data/yelp, format the file names as `{train/dev/test/reference}.{pos/neg}`.
* GYAFC
  * please refer to https://github.com/raosudha89/GYAFC-corpus, format the file names as `{train/dev/test/reference}.{informal/formal}`. Multiple references are distinguished by adding a number at the end, like `reference.informal.0`, `reference.informal.1`, `reference.informal.2`, `reference.informal.3`.

## Preprocess
```python
python preprocess.py --dataset [yelp/gyafc]
```

## Train
The detailed training parameters of each dataset are in `config.py`.

```python
python main.py --dataset [yelp/gyafc]
```
The predicted results of each epoch is saved in the `./outputs/[yelp/gyafc]/` folder.

## Retriever
* Dense retriever. modify the `config.retriever = dense`.
* Sparse retriever. modify the `config.retriever = sparse`. Before training, you should import the training data into the local ElasticSearch (`index=dataset_name`), and ensure that the `get_query` in `model/SparseRetriever` could return the relevant sentences.

## Evaluate

Train from scratch by following steps.

* Train the BERT classifiers using `evaluator/train_classifier.py`.  Put the model in `evaluator/classifier/[yelp/gyafc]/`
```python
python evaluator/train_classifier.py --dataset yelp --pretrained_dir $YOUR_PRETRAINED_BERT_DIR --max_len 18
# or
python evaluator/train_classifier.py --dataset gyafc --pretrained_dir $YOUR_PRETRAINED_BERT_DIR --max_len 32
```
* Train the ![KenLM](https://github.com/kpu/kenlm)
```sh
export kenlm=$YOUR_KENLM_DIR/build/bin/
chmod +x evaluator/train_lm.sh

bash evaluator/train_lm.sh yelp
# or
bash evaluator/train_lm.sh gyafc
```
* Evaluate the results
```
python evaluator/evaluator.py --dataset yelp --file outputs/yelp/epoch_10
```
then it will report the automatic metrics of `outputs/yelp/epoch_10.pos` and `outputs/yelp/epoch_10.neg`