Building Slovene WordNet

Tomaž Erjavec, Darja Fišer


Abstract
A WordNet is a lexical database in which nouns, verbs, adjectives and adverbs are organized in a conceptual hierarchy, linking semantically and lexically related concepts. Such semantic lexicons have become oneof the most valuable resources for a wide range of NLP research and applications, such as semantic tagging, automatic word-sense disambiguation, information retrieval and document summarisation. Following the WordNet design for the English languagedeveloped at Princeton, WordNets for a number of other languages havebeen developed in the past decade, taking the idea into the domain ofmultilingual processing. This paper reports on the prototype SloveneWordNet which currently contains about 5,000 top-level concepts. Theresource has been automatically translated from the Serbian WordNet, with the help of a bilingual dictionary, synset literals ranked according to the frequency of corpus occurrence, and results manually corrected. The paper presents the results obtained, discusses some problems encountered along the way and points out some possibilitiesof automated acquisition and refinement of synsets in the future.
Anthology ID:
L06-1079
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/150_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Tomaž Erjavec and Darja Fišer. 2006. Building Slovene WordNet. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Building Slovene WordNet (Erjavec & Fišer, LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/150_pdf.pdf