# README

## Project Overview

This project introduces the ExpertEase framework aimed at enhancing LLMs' performance in grade-specific document simplification. The repository includes all necessary codes, datasets, and output files to replicate and understand the complete workflow.

## Directory Structure

```
├── codes
│   ├── api_request_parallel_processor.py
│   ├── longformer_train_dev_test_codes
│   │   ├── longformer_environment.yml
│   │   ├── longformer_train_test_predict.py
│   │   ├── longformer-hyperparameter-tunning.yaml
├── datasets
│   ├── LLMs_test.jsonl
├── outputs
│   ├── 1st_stage_LLMs_output.zip
│   ├── 2nd_stage_LLMs_output.zip
│   ├── 3rd_stage_LLMs_output.zip
```

## Description of Files and Directories

### Codes

- **api_request_parallel_processor.py**: This script handles parallel processing of API requests. It is used to make multiple API calls simultaneously, optimizing the time required for data processing.

- **longformer_train_dev_test_codes**:
  - **longformer_environment.yml**: This file contains the conda environment specifications required to run the Longformer model training and testing scripts. It includes all necessary dependencies and packages.
  - **longformer_train_test_predict.py**: This is the main script for training, testing, and making predictions using the Longformer model. It includes all necessary functions and classes for data processing, model training, and evaluation.
  - **longformer-hyperparameter-tunning.yaml**: This file contains the configuration for hyperparameter tuning of the Longformer model. It defines the relevant parameters for optimizing model performance.

### Datasets

- **LLMs_test.jsonl**: This JSON Lines file contains the test data for evaluating the LLMs grade-specific document simplification. Each line in the file represents a separate test instance.


### Outputs

- **1st_stage_LLMs_output.zip**: This zip file contains the output results from the first stage of the LLMs' evaluation. It includes model predictions and evaluation metrics.
- **2nd_stage_LLMs_output.zip**: This zip file contains the output results from the second stage of the LLMs' evaluation.
- **3rd_stage_LLMs_output.zip**: This zip file contains the output results from the third and final stage of the LLMs' evaluation.

## Dataset Sources

This project utilizes data from the following sources:

- **Newsela**: [https://newsela.com/data/]
  
- **CLEAR**: [https://github.com/scrosseye/CLEAR-Corpus]
  
- **Weebit**: You may contact sowmya@sfs.uni-tuebingen.de for the dataset.