Unsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embeddings

Pieter Fivez; Simon Suster; Walter Daelemans

doi:10.18653/v1/W17-2317

Unsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embeddings

Pieter Fivez, Simon Šuster, Walter Daelemans

Abstract

We present an unsupervised context-sensitive spelling correction method for clinical free-text that uses word and character n-gram embeddings. Our method generates misspelling replacement candidates and ranks them according to their semantic fit, by calculating a weighted cosine similarity between the vectorized representation of a candidate and the misspelling context. We greatly outperform two baseline off-the-shelf spelling correction tools on a manually annotated MIMIC-III test set, and counter the frequency bias of an optimized noisy channel model, showing that neural embeddings can be successfully exploited to include context-awareness in a spelling correction model.

Anthology ID:: W17-2317
Volume:: Proceedings of the 16th BioNLP Workshop
Month:: August
Year:: 2017
Address:: Vancouver, Canada,
Editors:: Kevin Bretonnel Cohen, Dina Demner-Fushman, Sophia Ananiadou, Junichi Tsujii
Venue:: BioNLP
SIG:: SIGBIOMED
Publisher:: Association for Computational Linguistics
Note:
Pages:: 143–148
Language:
URL:: https://preview.aclanthology.org/nschneid-patch-2/W17-2317/
DOI:: 10.18653/v1/W17-2317
Bibkey:
Cite (ACL):: Pieter Fivez, Simon Šuster, and Walter Daelemans. 2017. Unsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embeddings. In Proceedings of the 16th BioNLP Workshop, pages 143–148, Vancouver, Canada,. Association for Computational Linguistics.
Cite (Informal):: Unsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embeddings (Fivez et al., BioNLP 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-2/W17-2317.pdf

PDF Cite Search Fix data