Špela Vintar

Also published as: Spela Vintar


The NetViz terminology visualization tool and the use cases in karstology domain modeling
Senja Pollak | Vid Podpečan | Dragana Miljkovic | Uroš Stepišnik | Špela Vintar
Proceedings of the 6th International Workshop on Computational Terminology

We present the NetViz terminology visualization tool and apply it to the domain modeling of karstology, a subfield of geography studying karst phenomena. The developed tool allows for high-performance online network visualization where the user can upload the terminological data in a simple CSV format, define the nodes (terms, categories), edges (relations) and their properties (by assigning different node colors), and then edit and interactively explore domain knowledge in the form of a network. We showcase the usefulness of the tool on examples from the karstology domain, where in the first use case we visualize the domain knowledge as represented in a manually annotated corpus of domain definitions, while in the second use case we show the power of visualization for domain understanding by visualizing automatically extracted knowledge in the form of triplets extracted from the karstology domain corpus. The application is entirely web-based without any need for downloading or special configuration. The source code of the web application is also available under the permissive MIT license, allowing future extensions for developing new terminological applications.

Mining Semantic Relations from Comparable Corpora through Intersections of Word Embeddings
Špela Vintar | Larisa Grčić Simeunović | Matej Martinc | Senja Pollak | Uroš Stepišnik
Proceedings of the 13th Workshop on Building and Using Comparable Corpora

We report an experiment aimed at extracting words expressing a specific semantic relation using intersections of word embeddings. In a multilingual frame-based domain model, specific features of a concept are typically described through a set of non-arbitrary semantic relations. In karstology, our domain of choice which we are exploring though a comparable corpus in English and Croatian, karst phenomena such as landforms are usually described through their FORM, LOCATION, CAUSE, FUNCTION and COMPOSITION. We propose an approach to mine words pertaining to each of these relations by using a small number of seed adjectives, for which we retrieve closest words using word embeddings and then use intersections of these neighbourhoods to refine our search. Such cross-language expansion of semantically-rich vocabulary is a valuable aid in improving the coverage of a multilingual knowledge base, but also in exploring differences between languages in their respective conceptualisations of the domain.


pdf bib
Neural Machine Translation of Literary Texts from English to Slovene
Taja Kuzman | Špela Vintar | Mihael Arčan
Proceedings of the Qualities of Literary Machine Translation


pdf bib
Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017)
Francesca Frontini | Larisa Grčić Simeunović | Špela Vintar | Anas Fahad Khan | Artemis Parvisi
Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017)


Were the clocks striking or surprising? Using WSD to improve MT performance
Špela Vintar | Darja Fišer | Aljoša Vrščaj
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)


Building and Using Comparable Corpora for Domain-Specific Bilingual Lexicon Extraction
Darja Fišer | Nikola Ljubešić | Špela Vintar | Senja Pollak
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web


Learning to Mine Definitions from Slovene Structured and Unstructured Knowledge-Rich Resources
Darja Fišer | Senja Pollak | Špela Vintar
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The paper presents an innovative approach to extract Slovene definition candidates from domain-specific corpora using morphosyntactic patterns, automatic terminology recognition and semantic tagging with wordnet senses. First, a classification model was trained on examples from Slovene Wikipedia which was then used to find well-formed definitions among the extracted candidates. The results of the experiment are encouraging, with accuracy ranging from 67% to 71%. The paper also addresses some drawbacks of the approach and suggests ways to overcome them in future work.


Harvesting Multi-Word Expressions from Parallel Corpora
Špela Vintar | Darja Fišer
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The paper presents a set of approaches to extend the automatically created Slovene wordnet with nominal multi-word expressions. In the first approach multi-word expressions from Princeton WordNet are translated with a technique that is based on word-alignment and lexico-syntactic patterns. This is followed by extracting new terms from a monolingual corpus using keywordness ranking and contextual patterns. Finally, the multi-word expressions are assigned a hypernym and added to our wordnet. Manual evaluation and comparison of the results shows that the translation approach is the most straightforward and accurate. However, it is successfully complemented by the two monolingual approaches which are able to identify more term candidates in the corpus that would otherwise go unnoticed. Some weaknesses of the proposed wordnet extension techniques are also addressed.


Evaluation Resources for Concept-based Cross-Lingual Information Retrieval in the Medical Domain
Paul Buitelaar | Diana Steffen | Martin Volk | Dominic Widdows | Bogdan Sacaleanu | Špela Vintar | Stanley Peters | Hans Uszkoreit
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)


Evaluation Corpora for Sense Disambiguation in the Medical Domain
Diana Raileanu | Paul Buitelaar | Spela Vintar | Jörg Bay
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

An Efficient and Flexible Format for Linguistic and Semantic Annotation
Špela Vintar | Paul Buitelaar | Bärbel Ripplinger | Bogdan Sacaleanu | Diana Raileanu | Detlef Prescher
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)


pdf bib
Extracting Terms and Terminological Collocations from the ELAN Slovene–English Parallel Corpus
Špela Vintar
5th EAMT Workshop: Harvesting Existing Resources