Electronic medical records (EMR) have largely replaced hand-written patient files in healthcare. The growing pool of EMR data presents a significant resource in medical research, but the U.S. Health Insurance Portability and Accountability Act (HIPAA) mandates redacting medical records before performing any analysis on the same. This process complicates obtaining medical data and can remove much useful information from the record. As part of a larger project involving ontology-driven medical processing, we employ a method of recognizing protected health information (PHI) that maps to ontological terms. We then use the relationships defined in the ontology to redact medical texts so that roles and semantics of terms are retained without compromising anonymity. The method is evaluated by clinical experts on several hundred medical documents, achieving up to a 98.8% f-score, and has already shown promise for retaining semantic information in later processing.
CaseSummarizer: A System for Automated Summarization of Legal Texts
Seth Polsley | Pooja Jhunjhunwala | Ruihong Huang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations
Attorneys, judges, and others in the justice system are constantly surrounded by large amounts of legal text, which can be difficult to manage across many cases. We present CaseSummarizer, a tool for automated text summarization of legal documents which uses standard summary methods based on word frequency augmented with additional domain-specific knowledge. Summaries are then provided through an informative interface with abbreviations, significance heat maps, and other flexible controls. It is evaluated using ROUGE and human scoring against several other summarization systems, including summary text and feedback provided by domain experts.