# Logistic regression examples

## Available datasets

* SNIPS dataset
  * Coucke et al. "Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces", 2018
  * Uses pre-processed data obtained from BERT sentence embeddings
  * see `snips_preprocessed/preproces.py` for details
* Twitter dataset
  * Uses pre-processed data obtained from BERT sentence embeddings (same as SNIPS)
* Youtube dataset
  * Uses pre-processed data obtained from BERT sentence embeddings (same as SNIPS)

## Usage

```
$ python3 examples/logistic_regession_iris.py --generate_keys --heaan_preset FGb --model_type multinomial
```

### Arguments

* `generate_keys`: Whether to generate new keys or use generated keys. Defaults to `False`.
* `heaan_preset`: Parameter preset name. Defaults to `FGb`.
* `with_piheaan`: Whether to run with piheaan or real heaan. Defaults to `False` (real heaan).
* `model_type`: Logistic regression model type - `ovr` or `multinomial`. Defaults to `multinomial`.
* `batch_size`: Batch size. Need to be a multiple of `context.shape[0]`. Defaults to `0`, which essentially becomes `128`.
* `num_epoch`: Number of epochs.
* `encrypted_train`: Whether to train with encrypted data or not. Defaults to `False`.

### SNIPS dataset

For the logistic regression experiments with SNIPS dataset, you may need to download and pre-process data first.
```
$ python3 examples/snips_preprocessed/download.py
$ python3 examples/snips_preprocessed/preprocess.py
```
The above commands will give you 6 files in `examples/snips_preprocessed`:
* `train.txt`, `val.txt`, `test.txt`: raw data
* `train_xy.txt`, `val_xy.txt`, `test_xy.txt`: label & sentence data
* `train.csv`, `val.csv`, `test.csv`: pre-processed data (sentence embeddings)

This may take under 20 minutes.
