Abstract
We present an unsupervised context-sensitive spelling correction method for clinical free-text that uses word and character n-gram embeddings. Our method generates misspelling replacement candidates and ranks them according to their semantic fit, by calculating a weighted cosine similarity between the vectorized representation of a candidate and the misspelling context. We greatly outperform two baseline off-the-shelf spelling correction tools on a manually annotated MIMIC-III test set, and counter the frequency bias of an optimized noisy channel model, showing that neural embeddings can be successfully exploited to include context-awareness in a spelling correction model.- Anthology ID:
- W17-2317
- Volume:
- BioNLP 2017
- Month:
- August
- Year:
- 2017
- Address:
- Vancouver, Canada,
- Editors:
- Kevin Bretonnel Cohen, Dina Demner-Fushman, Sophia Ananiadou, Junichi Tsujii
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 143–148
- Language:
- URL:
- https://aclanthology.org/W17-2317
- DOI:
- 10.18653/v1/W17-2317
- Cite (ACL):
- Pieter Fivez, Simon Šuster, and Walter Daelemans. 2017. Unsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embeddings. In BioNLP 2017, pages 143–148, Vancouver, Canada,. Association for Computational Linguistics.
- Cite (Informal):
- Unsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embeddings (Fivez et al., BioNLP 2017)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/W17-2317.pdf
- Data
- MIMIC-III