First Steps towards Building a Medical Lexicon for Spanish with Linguistic and Semantic Information

Leonardo Campillos-Llanos


Abstract
We report the work-in-progress of collecting MedLexSp, an unified medical lexicon for the Spanish language, featuring terms and inflected word forms mapped to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs), semantic types and groups. First, we leveraged a list of term lemmas and forms from a previous project, and mapped them to UMLS terms and CUIs. To enrich the lexicon, we used both domain-corpora (e.g. Summaries of Product Characteristics and MedlinePlus) and natural language processing techniques such as string distance methods or generation of syntactic variants of multi-word terms. We also added term variants by mapping their CUIs to missing items available in the Spanish versions of standard thesauri (e.g. Medical Subject Headings and World Health Organization Adverse Drug Reactions terminology). We enhanced the vocabulary coverage by gathering missing terms from resources such as the Anatomical Therapeutical Classification, the National Cancer Institute (NCI) Dictionary of Cancer Terms, OrphaData, or the Nomenclátor de Prescripción for drug names. Part-of-Speech information is being included in the lexicon, and the current version amounts up to 76 454 lemmas and 203 043 inflected forms (including conjugated verbs, number and gender variants), corresponding to 30 647 UMLS CUIs. MedLexSp is distributed freely for research purposes.
Anthology ID:
W19-5017
Volume:
Proceedings of the 18th BioNLP Workshop and Shared Task
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
152–164
Language:
URL:
https://aclanthology.org/W19-5017
DOI:
10.18653/v1/W19-5017
Bibkey:
Cite (ACL):
Leonardo Campillos-Llanos. 2019. First Steps towards Building a Medical Lexicon for Spanish with Linguistic and Semantic Information. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 152–164, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
First Steps towards Building a Medical Lexicon for Spanish with Linguistic and Semantic Information (Campillos-Llanos, BioNLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-bitext-workshop/W19-5017.pdf
Code
 lcampillos/bionlp2019