# PT-M2
This repository contains the source code for ["Revisiting Grammatical Error Correction 
Evaluation and Beyond"](), which experiments if recent pretrain-based (PT-based) metrics 
such as BERTScore and BARTScore are suitable for GEC evaluation task and proposes a novel 
PT-based GEC metric **PT-M2**, which uses to evaluate GEC system outputs with pretrained 
knowledge, measures whether the GEC system corrects more important errors.

## Overview
PT-M2 takes advantages of both PT-based metrics (e.g. BERTScore, BARTScore) and
edit-based metrics (e.g. M2, ERRANT). Without directly using PT-based metrics to
score hypothesis-reference sentence pairs, we use them at the edit level to compute 
a score for each edit. Experiments show that PT-M2 correlates better with human 
judgements on both sentence-level and corpus-level, and is competent to evaluate
high-performing GEC systems.


For an illustration, PT-M2 can be computed as 
<img src="img/model.png" class="center">

## News

- 2022.10.13 Release Code
- 2022.10.6 Paper gets accepted to EMNLP 2022 

## Installation

- Python version >= 3.6
- PyTorch version >= 1.0.0
- Transformers version >= 4.10.0

Install it from the source by:

```sh
git clone https://github.com/pygongnlp/PT-M2.git
cd PT-M2
```


## Correlation Experiments
### Oral
* M2\
``python evaluate.py --base m2 --scorer self --output_file m2``
* SentM2\
``python evaluate.py --base sentm2 --scorer self --output_file sentm2``
* ERRANT\
``python evaluate.py --base errant --scorer self --output_file errant``
* SentM2\
``python evaluate.py --base senterrant --scorer self --output_file senterrant``

### BERTScore Edit Scorer
* M2\
``python evaluate.py --base m2 --scorer bertscore --model_type bert-base-uncased --output_file bertscore_bertbase_m2``
* SentM2\
``python evaluate.py --base sentm2 --scorer bertscore --model_type bert-base-uncased --output_file bertscore_bertbase_sentm2``
* ERRANT\
``python evaluate.py --base errant --scorer bertscore --model_type bert-base-uncased --output_file bertscore_bertbase_errant``
* SentM2\
``python evaluate.py --base senterrant --scorer bertscore --model_type bert-base-uncased --output_file bertscore_bertbase_senterrant``

**Note**    all the PT models that BERTScore supports can be used in our metric, and output_file_name can be defined by yourself

### BARTScore Edit Scorer
* M2\
``python evaluate.py --base m2 --scorer bartscore --model_type bart-base --output_file bartscore_bartbase_m2``
* SentM2\
``python evaluate.py --base sentm2 --scorer bartscore --model_type bart-base --output_file bartscore_bartbase_sentm2``
* ERRANT\
``python evaluate.py --base errant --scorer bartscore --model_type bart-base --output_file bartscore_bartbase_errant``
* SentM2\
``python evaluate.py --base senterrant --scorer bartscore --model_type bart-base --output_file bartscore_bartbase_senterrant``

**Note**    all the PT models that BARTScore supports can be used in our metric, and output_file_name can be defined by yourself

### Compute correlation

* gzip ranking files you generate \
``cd .\gecmetrics ``\
``gzip .\scores\conll14\system_scores_metrics\*.txt``
* system-level evaluation\
``bash run.sh``

## Authors
[Peiyuan Gong](https://pygongnlp.github.io/)

[Xuebo Liu](https://sunbowliu.github.io/)

Heyan Huang

Min Zhang


## Contact
Please contact pygongnlp@gmail.com if you have any questions/suggestions

## Bib
If you find this repo useful, please cite: