Files for "A Neural Pairwise Ranking Model for Readability Assessment"

The following files contain code used to generate neural and non-neural ML baseline scores, calculate metrics, and our implementation of NPRM.

The code is not completely cleaned.  We will tidy up the format upon acceptance of the paper.


Files:

pairwise_readability_ranking.py - Contains code to group texts by slugs, generate pairwise permutation features, and train the SVMRank and NPRM models with BERT or mBERT base.

bert_ml_scores.py - Contains code to train BERT and mBERT regression and classification models.

ml_scores.py - Contains code to train non-neural regression and classification models with word embeddings as input.  

baseline_evaluation.py - Contains code to generate ranking, classification, and regression specific metrics for model results.

The "results" folder contains .csv files containing prediction results for that datasets.  For example, 'newsela_en_baselines.csv' contains results for models predicting on the NewsEla EN dataset.