# StyleFlow: Disentangle Latent Representations via Normalizing Flow for Unsupervised Text Style Transfer

## Approach
![model overview.png](pictures%2Fmodel%20overview.png)
***
## Dataset
### [Yelp and IMDb](https://www.yelp.com/dataset/challenge) : negative sentiment (0) <--> positive sentiment (1)
**Original dataset**: The original Yelp and IMDb dataset is in the *data/yelp* and *data/imdb/* directory.

### [GYAFC](https://github.com/raosudha89/GYAFC-corpus): informal text (0) <--> formal text (1)
Since the GYAFC dataset is only free of charge for research purposes, 
we only publish a subset of the test dataset in the family and relationships domain (data/GYAFC/), 
the outputs (outputs/GYAFC/) of each system (including our model and all baselines) 
and the corresponding human references (references/GYAFC/). 
If you want to download the train and validation dataset, 
please follow the guidance at https://github.com/raosudha89/GYAFC-corpus. 
And then, name the corpora of two styles as the yelp dataset.

***
## Requirements
- python==3.9

- pytorch==1.13.0

- torchtext==0.12.1

- nltk==3.6.7

- kenlm



## Usage

The hyperparameters for the StyleFlow can be found in ''main.py''.

The most of them are listed below:

```
    data_path : the path of the datasets
    log_dir : where to save the logging info
    save_path = where to save the checkpoing
    min_freq : the minimun frequency for building vocabulary
    max_length : the maximun sentence length 
    embed_size : the dimention of the token embedding
    d_model : the dimention of Transformer d_model parameter
    h : the number of Transformer attention head
    num_layers : the number of Transformer layer
    batch_size : the training batch size
    lr_F : the learning rate for the Style Transformer
    lr_D : the learning rate for the discriminator
    L2 : the L2 norm regularization factor
    iter_D : the number of the discriminator update step pre training interation
    iter_F : the number of the Style Transformer update step pre training interation
    dropout : the dropout factor for the whole model

    log_steps : the number of steps to log model info
    eval_steps : the number of steps to evaluate model info

    slf_factor : the weight factor for the self reconstruction loss
    cyc_factor : the weight factor for the cycle reconstruction loss 
    content_factor: the weight factor for the content loss
    style_factor : the weight factor for the style loss
```

You can adjust them in the Config class from the ''main.py''.



If you want to run the model, use the command:

```shell
python main.py
```
If you want to evaluate the model, use the command:
```shell
cd evaluator
python calculate_all.py
```

## Outputs

Update: You can find the outputs of our model in the "outputs" folder.