# Learning Universal Authorship Representations

This is the official repository for the EMNLP 2021 paper ["Learning Universal Authorship Representations"](https://aclanthology.org/2021.emnlp-main.70/). The paper studies whether the authorship representations learned in one domain transfer to another. To do so, we conduct the first large-scale study of cross-domain transfer for authorship verification considering zero-shot transfers involving three disparate domains: Amazon Reviews, fanfiction short stories, and Reddit comments.

## HuggingFace
LUAR model variations are now available on HuggingFace! They can be found [here](https://huggingface.co/collections/rrivera1849/luar-65133328387d403b2e6f33a2).

## Installation
Run the following commands to create an environment and install all the required packages:
```bash
python3 -m venv vluar
. ./vluar/bin/activate
pip3 install -U pip
pip3 install -r requirements.txt
```

## Training

### UAR_Play
```bash
python main.py --dataset_name drama --data_path ../data/full_data/ --model_name sentence-transformers/all-distilroberta-v1 --do_learn --validate --evaluate --gpus 1 --experiment_id uar_play --validate_every 1 --batch_size 1 --episode_length 8 --token_max_length 64 --embedding_dim 512
```

### UAR_Scene
```bash
python main.py --dataset_name drama --data_path ../data/scene_data/ --model_name sentence-transformers/all-distilroberta-v1 --do_learn --validate --evaluate --gpus 1 --experiment_id uar_scene --validate_every 1 --batch_size 8 --episode_length 8 --token_max_length 64 --embedding_dim 512
```

## Testing 

Results of Table 2 (on DramaCV) are presented in the notebook `evaluate.ipynb`, where we used the best performing models from each of the training.