Leonid Iomdin

Also published as: Leonid L. Iomdin


2019

2016

Microsyntactic linguistic units, such as syntactic idioms and non-standard syntactic constructions, are poorly represented in linguistic resources, mostly because the former are elements occupying an intermediate position between the lexicon and the grammar and the latter are too specific to be routinely tackled by general grammars. Consequently, many such units produce substantial gaps in systems intended to solve sophisticated computational linguistics tasks, such as parsing, deep semantic analysis, question answering, machine translation, or text generation. They also present obstacles for applying advanced techniques to these tasks, such as machine learning. The paper discusses an approach aimed at bridging such gaps, focusing on the development of monolingual and multilingual corpora where microsyntactic units are to be tagged.

2010

2008

2006

We describe a project aimed at creating a deeply annotated corpus of Russian texts. The annotation consists of comprehensive morphological marking, syntactic tagging in the form of a complete dependency tree, and semantic tagging within a restricted semantic dictionary. Syntactic tagging is using about 80 dependency relations. The syntactically annotated corpus counts more than 28,000 sentences and makes an autonomous part of the Russian National Corpus (www.ruscorpora.ru). Semantic tagging is based on an inventory of semantic features (descriptors) and a dictionary comprising about 3,000 entries, with a set of tags assigned to each lexeme and its argument slots. The set of descriptors assigned to words has been designed in such a way as to construct a linguistically relevant classification for the whole Russian vocabulary. This classification serves for discovering laws according to which the elements of various lexical and semantic classes interact in the texts. The inventory of semantic descriptors consists of two parts, object descriptors (about 90 items in total) and predicate descriptors (about a hundred). A set of semantic roles is thoroughly elaborated and contains about 50 roles.

2004

2002

2000

1999