# Structure
```
├── README.md
├── code
│   ├── howto100m
│   │   ├── caption_ir_analysis.py
│   │   ├── extract_caption.py
│   │   ├── ir_eval.py
│   │   ├── preprocess.py
│   │   ├── process_asr.py
│   │   └── script.py
│   ├── models
│   │   ├── constants.py
│   │   ├── ir
│   │   │   ├── elastic_search.py
│   │   │   └── run_elastic.sh
│   │   ├── reranking
│   │   │   ├── analysis.py
│   │   │   ├── bert_reranking.py
│   │   │   ├── convert_data_downstream.py
│   │   │   ├── gen_data.py
│   │   │   ├── inference.py
│   │   │   └── recall.py
│   │   └── utils.py
│   └── run_rerank.sh
└── data
    ├── gold.rerank.org.t30.dev.json
    ├── gold.rerank.org.t30.test.json
    ├── gold.rerank.org.t30.train.json
    └── sample15k.search.rerank.json
```
# Code
## Retrieval
We follow [Wieting el at](https://github.com/jwieting/paraphrastic-representations-at-scale) to encode each step and goal. We then retrieve the top-30 goals for a step
## Reranking
The code to run the reranking model. The training and the test scripts are in `run_rerank.sh`

# Data
## Automatic Evaluation
We split the ground-truth step-goal links to the `train`, `dev` and `test` split. 
Each entry in `gold.rerank.org.t30.{split}.json` contains the step, the retrieved candidate goals and the ground truth linked goal. 

## Randomly Sampled Data
We apply our method to the wikiHow corpus and here we randomly sample 15k data from the whole corpus. Our human evaluation is also conducted with a subset of this randomly sampled data collection.
We will release the whole corpus upon the acceptance. 