Duško Vitas

Also published as: Dusko Vitas


2022

pdf
Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection
Ranka Stanković | Cvetana Krstev | Branislava Šandrih Todorović | Dusko Vitas | Mihailo Skoric | Milica Ikonić Nešić
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper we present the Serbian part of the ELTeC multilingual corpus of novels written in the time period 1840-1920. The corpus is being built in order to test various distant reading methods and tools with the aim of re-thinking the European literary history. We present the various steps that led to the production of the Serbian sub-collection: the novel selection and retrieval, text preparation, structural annotation, POS-tagging, lemmatization and named entity recognition. The Serbian sub-collection was published on different platforms in order to make it freely available to various users. Several use examples show that this sub-collection is usefull for both close and distant reading approaches.

pdf
A Myriad of Ways to Say: “Wear a mask!”
Cvetana Krstev | Duško Vitas
Proceedings of the 5th International Conference on Computational Linguistics in Bulgaria (CLIB 2022)

This paper presents a small corpus of notices displayed at entrances of various Belgrade public premises asking those who enter to wear a mask. We analyze the various aspects of these notices: their physical appearance, script, lexica, syntax and style. A special attention is paid to various obligatory and optional parts of these notices. Obligatory parts deal with wearing masks, keeping the distance, limiting the number of persons on premises and using disinfection. We developed local grammars for modelling phrases that require wearing masks, that can be used both for recognition and for generation of paraphrases.

2020

pdf
Analysis of Similes in Serbian Literary Texts (1860-1920) using computational methods
Cvetana Krstev | Jelena Jaćimović | Duško Vitas
Proceedings of the 4th International Conference on Computational Linguistics in Bulgaria (CLIB 2020)

Similes are rhetorical figures which play an important role in literary texts. This paper presents a finite-state methodology developed for the description of adjectival similes, which enables their retrieval and annotation in Serbian novels written in the mid-19th and early 20th centuries. The results of a textometric analysis reveal the most frequent adjectival similes and the specificity of their usage, with respect to the author, title, or publication date, in a subset of the SrpELTeC corpus.

2014

pdf
Enriching SerbianWordNet and Electronic Dictionaries with Terms from the Culinary Domain
Staša Vujičić Stanković | Cvetana Krstev | Duško Vitas
Proceedings of the Seventh Global Wordnet Conference

2011

pdf
A tagged and aligned corpus for the study of Proper Names in translation
Emeline Lecuit | Denis Maurel | Duško Vitas
Proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora

pdf
E-Dictionaries and Finite-State Automata for the Recognition of Named Entities
Cvetana Krstev | Duško Vitas | Ivan Obradović | Miloš Utvić
Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing

2010

pdf
A Description of Morphological Features of Serbian: a Revision using Feature System Declaration
Cvetana Krstev | Ranka Stanković | Duško Vitas
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same morphological property using different approaches. We propose a new morphological description for Serbian following the feature structure representation defined by the ISO standard. In this description we try do incorporate all characteristics of Serbian that need to be specified for various applications. We have developed several XSLT scripts that transform our description into descriptions needed for various applications. We have developed the first version of this new description, but we treat it as an ongoing project because for some properties we have not yet found the satisfactory solution.

2009

pdf
E-Connecting Balkan Languages
Cvetana Krstev | Ranka Stanković | Duško Vitas | Svetla Koeva
Proceedings of the Workshop Multilingual resources, technologies and evaluation for central and Eastern European languages

2008

pdf
The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines
Cvetana Krstev | Ranka Stanković | Duško Vitas | Ivan Obradović
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we present how resources and tools developed within the Human Language Technology Group at the University of Belgrade can be used for tuning queries before submitting them to a web search engine. We argue that the selection of words chosen for a query, which are of paramount importance for the quality of results obtained by the query, can be substantially improved by using various lexical resources, such as morphological dictionaries and wordnets. These dictionaries enable semantic and morphological expansion of the query, the latter being very important in highly inflective languages, such as Serbian. Wordnets can also be used for adding another language to a query, if appropriate, thus making the query bilingual. Problems encountered in retrieving documents of interest are discussed and illustrated by examples. A brief description of resources is given, followed by an outline of the web tool which enables their integration. Finally, a set of examples is chosen in order to illustrate the use of the lexical resources and tool in question. Results obtained for these examples show that the number of documents obtained through a query by using our approach can double and even quadruple in some cases.

2006

pdf
WS4LR: A Workstation for Lexical Resources
Cvetana Krstev | Ranka Stanković | Duško Vitas | Ivan Obradović
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we describe WS4LR, the workstation for lexical resources, a software tool developed within the Human Language Technology Group at the Faculty of Mathematics, University of Belgrade. The tool is aimed at manipulating heterogeneous lexical resources, and the need for such a tool came from the large volume of resources the Group has developed in the course of many years and within different projects. The tool handles morphological dictionaries, wordnets, aligned texts and transducers equally and has already proved very useful for various tasks. Although it has so far been used mainly for Serbian, WS4LR is not language dependent and can be successfully used for resources in other languages provided that they follow the described formats and methodologies. The tool operates on the .NET platform and runs on a personal computer under Windows 2000/XP/2003 operating system with at least 256MB of internal memory.

2004

pdf
Combining Heterogeneous Lexical Resources
Cvetana Krstev | Duško Vitas | Ranka Stankoviæ | Ivan Obradoviæ | Gordana Pavloviæ-Lažetiæ
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf
The MULTEXT-East Morphosyntactic Specification for Slavic Languages
Tomaž Erjavec | Cvetana Krstev | Vladimír Petkevič | Kiril Simov | Marko Tadić | Duško Vitas
Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages

pdf
Composite Tense Recognition and Tagging in Serbian
Duško Vitas | Cvetana Krstev
Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages