Abstract
A statistical study was made of the extent to which Russian nouns enter into certain kinds of syntactic combination. The basis of the study was a corpus of 180,000 running words of Russian physics text prepared for analysis by the Automatic Language Data Processing group at The Rand Corporation; for each sentence of text the syntactic dependency of each word had been previously coded. A data retrieval program was applied, showing for each noun in text the number of occurrences (a) with at least one genitive noun dependent, (b) with at least one adjective dependent, and (c) with either type of dependent. A listing of all nouns in text (64,026 occurrences of 2,993 nouns) was prepared, ordered by frequency, and showing counts for a, b, and c above. Separate listings were prepared, showing for each noun that occurred 50 times or more the probability P that it would be modified in each of these three ways; these listings were ordered on P. The data suggests, among others, the following conclusions: there is statistical significance in the variability with which nouns enter into the given combinations; the partial interchangeability of adjective and genitive noun modification is supported; a general correspondence exists between combinatorial groupings of nouns and morphological or semantic groupings (concrete nouns have low P for genitive complementation, abstract nouns have high P, etc); the use of words in a given field of discourse can be determined empirically (e.g., the use of deverbative nouns either to indicate a process or the result of a process). It is suggested that the distributional approach is a useful supplement to traditional syntactic and semantic classification schemes, and that it is of direct utility in automatic parsing programs.