2015
pdf
Bilingual English-Czech Valency Lexicon Linked to a Parallel Corpus
Zdeňka Urešová
|
Ondřej Dušek
|
Eva Fučíková
|
Jan Hajič
|
Jana Šindlerová
Proceedings of the 9th Linguistic Annotation Workshop
pdf
Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation
Ondřej Dušek
|
Eva Fučíková
|
Jan Hajič
|
Martin Popel
|
Jana Šindlerová
|
Zdeňka Urešová
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)
pdf
Zero Alignment of Verb Arguments in a Parallel Treebank
Jana Šindlerová
|
Eva Fučíková
|
Zdeňka Urešová
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)
2014
pdf
abs
Resources in Conflict: A Bilingual Valency Lexicon vs. a Bilingual Treebank vs. a Linguistic Theory
Jana Šindlerová
|
Zdeňka Urešová
|
Eva Fucikova
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this paper, we would like to exemplify how a syntactically annotated bilingual treebank can help us in exploring and revising a developed linguistic theory. On the material of the Prague Czech-English Dependency Treebank we observe sentences in which an Addressee argument in one language is linked translationally to a Patient argument in the other one, and make generalizations about the theoretical grounds of the argument non-correspondences and its relations to the valency theory beyond the annotation practice. Exploring verbs of three semantic classes (Judgement verbs, Teaching verbs and Attempt Suasion verbs) we claim that the Functional Generative Description argument labelling is highly dependent on the morphosyntactic realization of the individual participants, which then results in valency frame differences. Nevertheless, most of the differences can be overcome without substantial changes to the linguistic theory itself.
2013
pdf
An Analysis of Annotation of Verb-Noun Idiomatic Combinations in a Parallel Dependency Corpus
Zdenka Uresova
|
Jan Hajic
|
Eva Fucikova
|
Jana Sindlerova
Proceedings of the 9th Workshop on Multiword Expressions
2012
pdf
abs
Announcing Prague Czech-English Dependency Treebank 2.0
Jan Hajič
|
Eva Hajičová
|
Jarmila Panevová
|
Petr Sgall
|
Ondřej Bojar
|
Silvie Cinková
|
Eva Fučíková
|
Marie Mikulová
|
Petr Pajas
|
Jan Popelka
|
Jiří Semecký
|
Jana Šindlerová
|
Jan Štěpánek
|
Josef Toman
|
Zdeňka Urešová
|
Zdeněk Žabokrtský
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives a high level overview of the underlying linguistic theory (the so-called tectogrammatical annotation) with some details of the most important features like valency annotation, ellipsis reconstruction or coreference.
2010
pdf
abs
Building a Bilingual ValLex Using Treebank Token Alignment: First Observations
Jana Šindlerová
|
Ondřej Bojar
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
We explore the potential and limitations of a concept of building a bilingual valency lexicon based on the alignment of nodes in a parallel treebank. Our aim is to build an electronic Czech->English Valency Lexicon by collecting equivalences from bilingual treebank data and storing them in two already existing electronic valency lexicons, PDT-VALLEX and Engvallex. For this task a special annotation interface has been built upon the TrEd editor, allowing quick and easy collecting of frame equivalences in either of the source lexicons. The issues encountered so far include limitations of technical character, theory-dependent limitations and limitations concerning the achievable degree of quality of human annotation. The issues of special interest for both linguists and MT specialists involved in the project include linguistically motivated non-balance between the frame equivalents, either in number or in type of valency participants. The first phases of annotation so far attest the assumption that there is a unique correspondence between the functors of the translation-equivalent frames. Also, hardly any linguistically significant non-balance between the frames has been found, which is partly promising considering the linguistic theory used and partly caused by little stylistic variety of the annotated corpus texts.