Igor Sominsky


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2010

pdf bib
The ConceptMapper Approach to Named Entity Recognition
Michael Tanenblatt | Anni Coden | Igor Sominsky
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

ConceptMapper is an open source tool we created for classifying mentions in an unstructured text document based on concept terminologies (dictionaries) and yielding named entities as output. It is implemented as a UIMA (Unstructured Information Management Architecture) annotator and is highly configurable: concepts can come from standardised or proprietary terminologies; arbitrary attributes can be associated with dictionary entries, and those attributes can then be associated with the named entities in the output; numerous search strategies and search options can be specified; any tokenizer packaged as a UIMA annotator can be used to tokenize the dictionary, so the same tokenization can be guaranteed for the input and dictionary, minimising tokenization mismatch errors; and the types and features of UIMA annotations used as input and generated as output can also be controlled. We describe ConceptMapper and its configuration parameters and their trade-offs, then describe the results of an experiment wherein some of these parameters are varied and precision and recall are subsequently measured in the task of in identifying concepts in a collection English-language clinical reports (colon cancer pathology). ConceptMapper is available from the Apache UIMA Sandbox, covered by the Apache Open Source license.