Jonatan Uppström


2013

pdf
Korp and Karp – A Bestiary of Language Resources: The Research Infrastructure of Språkbanken
Malin Ahlberg | Lars Borin | Markus Forsberg | Martin Hammarstedt | Leif-Jöran Olsson | Olof Olsson | Johan Roxendal | Jonatan Uppström
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2012

pdf
The open lexical infrastructure of Språkbanken
Lars Borin | Markus Forsberg | Leif-Jöran Olsson | Jonatan Uppström
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present our ongoing work on Karp, Språkbanken's (the Swedish Language Bank) open lexical infrastructure, which has two main functions: (1) to support the work on creating, curating, and integrating our various lexical resources; and (2) to publish daily versions of the resources, making them searchable and downloadable. An important requirement on the lexical infrastructure is also that we maintain a strong bidirectional connection to our corpus infrastructure. At the heart of the infrastructure is the SweFN++ project with the goal to create free Swedish lexical resources geared towards language technology applications. The infrastructure currently hosts 15 Swedish lexical resources, including historical ones, some of which have been created from scratch using existing free resources, both external and in-house. The resources are integrated through links to a pivot lexical resource, SALDO, a large morphological and lexical-semantic resource for modern Swedish. SALDO has been selected as the pivot partly because of its size and quality, but also because its form and sense units have been assigned persistent identifiers (PIDs) to which the lexical information in other lexical resources and in corpora are linked.