Abstract
We report the work-in-progress of collecting MedLexSp, an unified medical lexicon for the Spanish language, featuring terms and inflected word forms mapped to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs), semantic types and groups. First, we leveraged a list of term lemmas and forms from a previous project, and mapped them to UMLS terms and CUIs. To enrich the lexicon, we used both domain-corpora (e.g. Summaries of Product Characteristics and MedlinePlus) and natural language processing techniques such as string distance methods or generation of syntactic variants of multi-word terms. We also added term variants by mapping their CUIs to missing items available in the Spanish versions of standard thesauri (e.g. Medical Subject Headings and World Health Organization Adverse Drug Reactions terminology). We enhanced the vocabulary coverage by gathering missing terms from resources such as the Anatomical Therapeutical Classification, the National Cancer Institute (NCI) Dictionary of Cancer Terms, OrphaData, or the Nomenclátor de Prescripción for drug names. Part-of-Speech information is being included in the lexicon, and the current version amounts up to 76 454 lemmas and 203 043 inflected forms (including conjugated verbs, number and gender variants), corresponding to 30 647 UMLS CUIs. MedLexSp is distributed freely for research purposes.- Anthology ID:
- W19-5017
- Volume:
- Proceedings of the 18th BioNLP Workshop and Shared Task
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
- Venue:
- BioNLP
- SIG:
- SIGBIOMED
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 152–164
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/W19-5017/
- DOI:
- 10.18653/v1/W19-5017
- Cite (ACL):
- Leonardo Campillos-Llanos. 2019. First Steps towards Building a Medical Lexicon for Spanish with Linguistic and Semantic Information. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 152–164, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- First Steps towards Building a Medical Lexicon for Spanish with Linguistic and Semantic Information (Campillos-Llanos, BioNLP 2019)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/W19-5017.pdf
- Code
- lcampillos/bionlp2019