# HyperLoRA

Code for the paper "HyperLoRA: Efficient Cross-task Generalization via Low-Rank Adapters Generation"

Adapting pre-trained language models (PLMs) for cross-task generalization is a crucial research area within the field of NLP. While fine-tuning and in-context learning are effective approaches for adapting LMs to emerging tasks, they can be costly and inefficient.
Recently, some researchers have focused on achieving efficient task adaptation via hypernetwork, which is a meta network that generates task-specific weights based on task-oriented information without any optimization. 
However, the training of hypernetworks often lacks stability since the optimization signal is not straightforward, and the task information is not adequately representative. 
Moreover, previous works train hypenetworks with the general corpus, which is struggle in few-shot adapdation.
To address these issues, we introduce HyperLoRA, a hypernetwork for LoRA parameters generation. We propose a paradigm involving hypernetwork pre-training on instruction-following data and generalization fine-tuning on sparse task data. Furthermore, we utilize a weight-space loss and an automatic demonstration selection strategy to enhance the training stability and performance.
Experimental results and analysis across four benchmark datasets (P3, S-NI, BBH, and SuperGLUE) demonstrate the proposed approach has flexible generalization ability and superior performance.

## Overview
<div align=center>
<img src="resource/model.png" width="75%" height="75%" />
</div>

As illustrated in the figure, our HyperLoRA is a hypernetwork to convert task instructions to LoRA modules, which consists of three essential elements: a **text encoder** to transform task information into continuous representations, **P-generator** facilitates interaction between the encoded instructions and a collection of trainable embeddings, serving the role of synthesizing LoRA parameters.

## Note
Some codes are copied from the Transformers library of Hugginface Team, thus we reserve their copyright.


## Structure

```
HyperLoRA
 |-- data
 |-- scripts                      # running scripts
 |    |-- finetune                      # scripts for generalization fine-tuning
 |    |    |-- finetune_hyperlora.sh    # GLUE
 |    |    |-- bbh_inference_finetune_hyperlora.sh    # BBH Few-shot Finetune
 |    |-- fs_inference                  # scripts for few-shot inference
 |    |    |-- bbh_inference_hyperlora.sh     # BBH Few-shot
 |    |    |-- glue_inference_hyperlora.sh    # GLUE and SuperGLUE Few-shot
 |    |    |-- inference_lorahub.sh           # inference for lorahub
 |    |-- pretrain                      # scripts for pre-train
 |    |    |-- pretrain_glue_hyperlora.sh     # train hyperlora with GLUE
 |    |    |-- pretrain_hyperlora_bart.sh     # bart model to initialize hyperlora
 |    |    |-- pretrain_hyperlora_p3.sh       # train hyperlora with P3
 |    |    |-- pretrain_hyperlora_sni.sh      # train hyperlora with S-NI
 |    |    |-- pretrain_hyperlora.sh          # pre-train hyperlora on FLAN
 |-- src                               # source code   
 |    |-- dataset
 |    |    |-- glue_tasks.py                # glue tasks
 |    |    |-- super_glue.py                # super glue tasks
 |    |    |-- sni_dataset.py               # sni tasks
 |    |    |-- multitask_sampler.py         # sampler for multi-task training
 |    |-- metrics
 |    |    |-- metrics.py                   # metric
 |    |-- models 
 |    |    |-- hyperlora.py                 # main model
 |    |    |-- hypernet_t5.py               # hypernetwork model
 |    |    |-- modeling_t5_lora.py          # t5 + lora
 |    |-- trainer
 |    |    |-- seq2seq_trainer.py 
 |    |-- utils
 |    |    |-- arguments.py                 # arguments
 |    |    |-- prompts.py                   # prompt templates
 |    |    |-- utils.py
 |-- bbh_hyperlora_inference_each_task.py # BBH Few-shot inference code
 |-- finetune.py 
 |-- lorahub_inference.py 
 |-- p3_hyperlora_inference.py 
 |-- pretraining.py 
 |-- pretraining_p3.py 
 |-- pretraining_glue.py
 |-- requirements.txt
 |-- README.md

```

## Dataset
All of the datasets used in our paper can be downloaded from https://huggingface.co/datasets except S-NI, we list the download paths as follows:
1. FLAN v2: https://huggingface.co/datasets/lorahub/flanv2
2. P3: https://huggingface.co/datasets/bigscience/P3
3. S-NI: https://github.com/allenai/natural-instructions/archive/refs/tags/v2.6.tar.gz
4. BBH: https://huggingface.co/datasets/lukaemon/bbh
5. SuperGLUE: https://huggingface.co/datasets/super_glue
6. GLUE: https://huggingface.co/datasets/glue


##  HyperLora Pre-training

```bash
sbash scripts/pretrain/pretrain_hyperlora.sh
```

## Cross-Task Generalization (P3 & S-NI)

1. P3
```bash
sbash scripts/pretrain/pretrain_hyperlora_p3.sh
```

2. S-NI
```bash
sbash scripts/pretrain/pretrain_hyperlora_sni.sh
```

## Few-shot Adaptation (BBH & SuperGLUE)

1. BBH Few-shot inerence
```bash
sbash scripts/fs_inference/bbh_inference_hyperlora.sh
```

2. BBH Few-shot + generalization fine-tuning
```bash
sbash scripts/finetune/bbh_inference_finetune_hyperlora.sh
```

3. SuperGLUE Few-shot inference
```bash
sbash scripts/fs_inference/super_glue_inference_hyperlora.sh
```

## Arugments
- `n_demonstrations`: number of demonstrations inputs for the hyperlora
- `hypelora_name_or_path`: path to hyperlora model
- `model_name_or_path`: path to pre-trained model
- `lora_path`: path to pre-optimized lora
- `loss_beta`: the $\lambda$ hyper-parameter to control the relative weight of weight-space loss
- `output_dir`: path to save the trained model
- `pretrain_checkpoint`: the path of the pre-trained hyperlora
- `finetune`: wheter to fine-tune the hyperlora