This folder contains the subword segmentations of the used four datasets. In each folder, there are two segmentation files:
    * bertseg.txt: segmentations for each word in the corpus
    * bertseg-regularization.txt: up to 10 segmentations for each word in the corpus, together with the negtive log likelihood of that segmentation.
The bertseg-regularization results of WMT15 and WMT16 is put in the zip file due to size limitation.
