System Evaluation on a Named Entity Corpus from Clinical Notes
Karin Schuler, Vinod Kaggal, James Masanz, Philip Ogren, Guergana Savova
Abstract
This paper presents the evaluation of the dictionary look-up component of Mayo Clinics Information Extraction system. The component was tested on a corpus of 160 free-text clinical notes which were manually annotated with the named entity disease. This kind of clinical text presents many language challenges such as fragmented sentences and heavy use of abbreviations and acronyms. The dictionary used for this evaluation was a subset of SNOMED-CT with semantic types corresponding to diseases/disorders without any augmentation. The algorithm achieves an F-score of 0.56 for exact matches and F-scores of 0.76 and 0.62 for right and left-partial matches respectively. Machine learning techniques are currently under investigation to improve this task.- Anthology ID:
- L08-1365
- Volume:
- Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
- Month:
- May
- Year:
- 2008
- Address:
- Marrakech, Morocco
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2008/pdf/764_paper.pdf
- DOI:
- Cite (ACL):
- Karin Schuler, Vinod Kaggal, James Masanz, Philip Ogren, and Guergana Savova. 2008. System Evaluation on a Named Entity Corpus from Clinical Notes. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
- Cite (Informal):
- System Evaluation on a Named Entity Corpus from Clinical Notes (Schuler et al., LREC 2008)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2008/pdf/764_paper.pdf