A Swedish Scientific Medical Corpus for Terminology Management and Linguistic Exploration

Dimitrios Kokkinakis, Ulla Gerdin


Abstract
This paper describes the development of a new Swedish scientific medical corpus. We provide a detailed description of the characteristics of this new collection as well results of an application of the corpus on term management tasks, including terminology validation and terminology extraction. Although the corpus is representative for the scientific medical domain it still covers in detail a lot of specialised sub-disciplines such as diabetes and osteoporosis which makes it suitable for facilitating the production of smaller but more focused sub-corpora. We address this issue by making explicit some features of the corpus in order to demonstrate the usability of the corpus particularly for the quality assessment of subsets of official terminologies such as the Systematized NOmenclature of MEDicine - Clinical Terms (SNOMED CT). Domain-dependent language resources, labelled or not, are a crucial key components for progressing R&D in the human language technology field since such resources are an indispensable, integrated part for terminology management, evaluation, software prototyping and design validation and a prerequisite for the development and evaluation of a number of sublanguage dependent applications including information extraction, text mining and information retrieval.
Anthology ID:
L10-1030
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/60_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Dimitrios Kokkinakis and Ulla Gerdin. 2010. A Swedish Scientific Medical Corpus for Terminology Management and Linguistic Exploration. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
A Swedish Scientific Medical Corpus for Terminology Management and Linguistic Exploration (Kokkinakis & Gerdin, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/60_Paper.pdf