@inproceedings{flor-etal-2019-benchmark,
    title = "A Benchmark Corpus of {E}nglish Misspellings and a Minimally-supervised Model for Spelling Correction",
    author = "Flor, Michael  and
      Fried, Michael  and
      Rozovskaya, Alla",
    editor = "Yannakoudakis, Helen  and
      Kochmar, Ekaterina  and
      Leacock, Claudia  and
      Madnani, Nitin  and
      Pil{\'a}n, Ildik{\'o}  and
      Zesch, Torsten",
    booktitle = "Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/iwcs-25-ingestion/W19-4407/",
    doi = "10.18653/v1/W19-4407",
    pages = "76--86",
    abstract = "Spelling correction has attracted a lot of attention in the NLP community. However, models have been usually evaluated on artificiallycreated or proprietary corpora. A publiclyavailable corpus of authentic misspellings, annotated in context, is still lacking. To address this, we present and release an annotated data set of 6,121 spelling errors in context, based on a corpus of essays written by English language learners. We also develop a minimallysupervised context-aware approach to spelling correction. It achieves strong results on our data: 88.12{\%} accuracy. This approach can also train with a minimal amount of annotated data (performance reduced by less than 1{\%}). Furthermore, this approach allows easy portability to new domains. We evaluate our model on data from a medical domain and demonstrate that it rivals the performance of a model trained and tuned on in-domain data."
}Markdown (Informal)
[A Benchmark Corpus of English Misspellings and a Minimally-supervised Model for Spelling Correction](https://preview.aclanthology.org/iwcs-25-ingestion/W19-4407/) (Flor et al., BEA 2019)
ACL