# LevyHolt Dedicated Subsets

## Directory Descriptions
- `data_en_levyholt`: various subsets of LevyHolt dataset (in English);
- `data_zh_levyholt`: a few LevyHolt subsets in Chinese. The premises and hypotheses in Chinese are machine-translated 
from their corresponding English entries, similarly to Li et. al., 2022.
- `booqa_hypoonly`: Hypothesis-only data for BoOQA dataset (and McKenna dataset)

The meanings of the names of sub-directories are:
- `full`: the full LevyHolt dataset;
- `full_hypo_only`: the hypothesis-only data for full LevyHolt dataset;
- `directional`: the LevyHolt directional subset;
- `directional_hypo_only`: see above;
- `dirfalse_unr`: the *DirFalse-Unrelated* subset;
- `dirfalse_unr_hypo_only`: see above;
- `dirtrue_unr`: the *DirTrue-Unrelated* subset;
- `dirtrue_unr_hypo_only`: see above;
- `para_dirfalse`: the *Paraphrases-DirFalse* subset;
- `para_dirfalse_hypo_only`: see above;
- `para_dirtrue`: the *Paraphrases-DirTrue* subset;
- `para_dirtrue_hypo_only`: see above;
- `symmetric`: the *symmetric* subset;
- `symmetric_hypo_only`: see above;

## File Format
Each sub-directory contains a `train.txt`, `dev.txt` and a `test.txt`. Each is a tsv file, each line has for columns, 
in the order of: hypothesis, premise, label, language.