Petr Sgall

Also published as: P. Sgall


2012

We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives a high level overview of the underlying linguistic theory (the so-called tectogrammatical annotation) with some details of the most important features like valency annotation, ellipsis reconstruction or coreference.

2006

In the present contribution we claim that corpus annotation serves, among other things, as an invaluable test for linguistic theories standing behind the annotation schemes, and as such represents an irreplaceable resource of linguistic information for the build-up of grammars. To support this claim we present four linguistic phenomena for the study and relevant description of which in grammar a deep layer of corpus annotation as introduced in the Prague Dependency Treebank has brought important observations, namely the information structure of the sentence, condition of projectivity and word order, types of dependency relations and textual coreference.

2004

2002

2001

2000

After a brief characterization of the theory of the topic-focus articulation of the sentence (TFA), rules are formulated that determine the assignment of appropriate values of the TFA attribute in the process of syntactico-semantic tagging of a very large corpus of Czech.

1999

1995

The dichotomy of topic and focus, based, in the Praguean Functional Generative Description, on the scale of communicative dynamism, is relevant not only for a possible placement of the sentence in a context, but also for its semantic interpretation. An automatic identification of topic and focus may use the input information on word order, on the systemic ordering of kinds of complementations (reflected by the underlying order of the items included in the focus), on definiteness, and on lexical semantic properties of words. An algorithm for the analysis of English sentences has been implemented and is discussed and illustrated on several examples.

1993

An algorithm for automatic identification of topic and focus of the sentence is presented, based on dependency syntax and using written input, which is much more ambiguous than spoken utterance.

1989

1987

1986

1985

1983

In the present paper we characterize in more detail some of the aspects of a question answering system using as its starting point the underlying structure of sentences (which with some approaches can be identified with the level of meaning or of logical form). First of all, the criteria are described that are used to identify the elementary units of underlying structure and the operations conjoining them into complex units (Sect. 1), then the main types of units and operations resulting from an empirical investigation on the basis of the criteria are registered (Sect. 2), and finally the rules of inference , accounting for the relevant aspects of the relationship between linguistic and cognitive structures are illustrated (Sec. 3).

1982

1980

The necessity of and means for distinguishing between a level of linguistic meaning and a domain of "factual knowledge" (or cognitive content) are argued for, supported by a survey of relevant operational criteria. The level of meaning is characterized as a safe base for computational applications, which allows for a set of inference rules accounting for the content (factual relations) of a given domain.

1969

1965