Igor Boguslavsky

Also published as: Igor M. Boguslavsky


2019

2017

2016

Valency slot filling is a semantic glue, which brings together the meanings of words. As regards the position of an argument in the dependency structure with respect to its predicate, there exist three types of valency filling: active (canonical), passive, and discontinuous. Of these, the first type is studied much better than the other two. As a rule, canonical actants are unambiguously marked in the syntactic structure, and each actant corresponds to a unique syntactic position. Linguistic information on which syntactic function an actant might have (subject, direct or indirect object), what its morphological form should be and which prepositions or conjunctions it requires, can be given in the lexicon in the form of government patterns, subcategorization frames, or similar data structures. We concentrate on non-canonical cases of valency filling in Russian, which are characteristic of non-verbal parts of speech, such as adverbs, adjectives, and particles, in the first place. They are more difficult to handle than canonical ones, because the position of the actant in the tree is governed by more complicated rules. A valency may be filled by expressions occupying different syntactic positions, and a syntactic position may accept expressions filling different valencies of the same word. We show how these phenomena can be processed in a semantic analyzer.

2015

2014

2013

Joint morphological and syntactic analysis has been proposed as a way of improving parsing accuracy for richly inflected languages. Starting from a transition-based model for joint part-of-speech tagging and dependency parsing, we explore different ways of integrating morphological features into the model. We also investigate the use of rule-based morphological analyzers to provide hard or soft lexical constraints and the use of word clusters to tackle the sparsity of lexical features. Evaluation on five morphologically rich languages (Czech, Finnish, German, Hungarian, and Russian) shows consistent improvements in both morphological and syntactic accuracy for joint prediction over a pipeline model, with further improvements thanks to lexical constraints and word clusters. The final results improve the state of the art in dependency parsing for all languages.

2010

2008

2006

We describe a project aimed at creating a deeply annotated corpus of Russian texts. The annotation consists of comprehensive morphological marking, syntactic tagging in the form of a complete dependency tree, and semantic tagging within a restricted semantic dictionary. Syntactic tagging is using about 80 dependency relations. The syntactically annotated corpus counts more than 28,000 sentences and makes an autonomous part of the Russian National Corpus (www.ruscorpora.ru). Semantic tagging is based on an inventory of semantic features (descriptors) and a dictionary comprising about 3,000 entries, with a set of tags assigned to each lexeme and its argument slots. The set of descriptors assigned to words has been designed in such a way as to construct a linguistically relevant classification for the whole Russian vocabulary. This classification serves for discovering laws according to which the elements of various lexical and semantic classes interact in the texts. The inventory of semantic descriptors consists of two parts, object descriptors (about 90 items in total) and predicate descriptors (about a hundred). A set of semantic roles is thoroughly elaborated and contains about 50 roles.

2004

2002

2000

1999

1995