GRISP: A Massive Multilingual Terminological Database for Scientific and Technical Domains

Patrice Lopez, Laurent Romary

[How to correct problems with metadata yourself]


Abstract
The development of a multilingual terminology is a very long and costly process. We present the creation of a multilingual terminological database called GRISP covering multiple technical and scientific fields from various open resources. A crucial aspect is the merging of the different resources which is based in our proposal on the definition of a sound conceptual model, different domain mapping and the use of structural constraints and machine learning techniques for controlling the fusion process. The result is a massive terminological database of several millions terms, concepts, semantic relations and definitions. The accuracy of the concept merging between several resources have been evaluated following several methods. This resource has allowed us to improve significantly the mean average precision of an information retrieval system applied to a large collection of multilingual and multidomain patent documents. New specialized terminologies, not specifically created for text processing applications, can be aggregated and merged to GRISP with minimal manual efforts.
Anthology ID:
L10-1573
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/829_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Patrice Lopez and Laurent Romary. 2010. GRISP: A Massive Multilingual Terminological Database for Scientific and Technical Domains. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
GRISP: A Massive Multilingual Terminological Database for Scientific and Technical Domains (Lopez & Romary, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/829_Paper.pdf