2020
pdf
bib
abs
SynSemClass Linked Lexicon: Mapping Synonymy between Languages
Zdenka Uresova
|
Eva Fucikova
|
Eva Hajicova
|
Jan Hajic
Proceedings of the 2020 Globalex Workshop on Linked Lexicography
This paper reports on an extended version of a synonym verb class lexicon, newly called SynSemClass (formerly CzEngClass). This lexicon stores cross-lingual semantically similar verb senses in synonym classes extracted from a richly annotated parallel corpus, the Prague Czech-English Dependency Treebank. When building the lexicon, we make use of predicate-argument relations (valency) and link them to semantic roles; in addition, each entry is linked to several external lexicons of more or less “semantic” nature, namely FrameNet, WordNet, VerbNet, OntoNotes and PropBank, and Czech VALLEX. The aim is to provide a linguistic resource that can be used to compare semantic roles and their syntactic properties and features across languages within and across synonym groups (classes, or ’synsets’), as well as gold standard data for automatic NLP experiments with such synonyms, such as synonym discovery, feature mapping, etc. However, perhaps the most important goal is to eventually build an event type ontology that can be referenced and used as a human-readable and human-understandable “database” for all types of events, processes and states. While the current paper describes primarily the content of the lexicon, we are also presenting a preliminary design of a format compatible with Linked Data, on which we are hoping to get feedback during discussions at the workshop. Once the resource (in whichever form) is applied to corpus annotation, deep analysis will be possible using such combined resources as training data.
2019
pdf
bib
Parallel Dependency Treebank Annotated with Interlinked Verbal Synonym Classes and Roles
Zdeňka Urešová
|
Eva Fučíková
|
Eva Hajičová
|
Jan Hajič
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)
pdf
bib
abs
MRP 2019: Cross-Framework Meaning Representation Parsing
Stephan Oepen
|
Omri Abend
|
Jan Hajic
|
Daniel Hershcovich
|
Marco Kuhlmann
|
Tim O’Gorman
|
Nianwen Xue
|
Jayeol Chun
|
Milan Straka
|
Zdenka Uresova
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning
The 2019 Shared Task at the Conference for Computational Language Learning (CoNLL) was devoted to Meaning Representation Parsing (MRP) across frameworks. Five distinct approaches to the representation of sentence meaning in the form of directed graph were represented in the training and evaluation data for the task, packaged in a uniform abstract graph representation and serialization. The task received submissions from eighteen teams, of which five do not participate in the official ranking because they arrived after the closing deadline, made use of additional training data, or involved one of the task co-organizers. All technical information regarding the task, including system submissions, official results, and links to supporting resources and software are available from the task web site at: http://mrp.nlpl.eu
2018
pdf
bib
abs
Synonymy in Bilingual Context: The CzEngClass Lexicon
Zdeňka Urešová
|
Eva Fučíková
|
Eva Hajičová
|
Jan Hajič
Proceedings of the 27th International Conference on Computational Linguistics
This paper describes CzEngClass, a bilingual lexical resource being built to investigate verbal synonymy in bilingual context and to relate semantic roles common to one synonym class to verb arguments (verb valency). In addition, the resource is linked to existing resources with the same of a similar aim: English and Czech WordNet, FrameNet, PropBank, VerbNet (SemLink), and valency lexicons for Czech and English (PDT-Vallex, Vallex, and EngVallex). There are several goals of this work and resource: (a) to provide gold standard data for automatic experiments in the future (such as automatic discovery of synonym classes, word sense disambiguation, assignment of classes to occurrences of verbs in text, coreferential linking of verb and event arguments in text, etc.), (b) to build a core (bilingual) lexicon linked to existing resources, for comparative studies and possibly for training automatic tools, and (c) to enrich the annotation of a parallel treebank, the Prague Czech English Dependency Treebank, which so far contained valency annotation but has not linked synonymous senses of verbs together. The method used for extracting the synonym classes is a semi-automatic process with a substantial amount of manual work during filtering, role assignment to classes and individual Class members’ arguments, and linking to the external lexical resources. We present the first version with 200 classes (about 1800 verbs) and evaluate interannotator agreement using several metrics.
pdf
bib
Tools for Building an Interlinked Synonym Lexicon Network
Zdeňka Urešová
|
Eva Fučíková
|
Eva Hajičová
|
Jan Hajič
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
pdf
bib
Creating a Verb Synonym Lexicon Based on a Parallel Corpus
Zdeňka Urešová
|
Eva Fučíková
|
Eva Hajičová
|
Jan Hajič
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2017
pdf
bib
abs
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman
|
Martin Popel
|
Milan Straka
|
Jan Hajič
|
Joakim Nivre
|
Filip Ginter
|
Juhani Luotolahti
|
Sampo Pyysalo
|
Slav Petrov
|
Martin Potthast
|
Francis Tyers
|
Elena Badmaeva
|
Memduh Gokirmak
|
Anna Nedoluzhko
|
Silvie Cinková
|
Jan Hajič jr.
|
Jaroslava Hlaváčová
|
Václava Kettnerová
|
Zdeňka Urešová
|
Jenna Kanerva
|
Stina Ojala
|
Anna Missilä
|
Christopher D. Manning
|
Sebastian Schuster
|
Siva Reddy
|
Dima Taji
|
Nizar Habash
|
Herman Leung
|
Marie-Catherine de Marneffe
|
Manuela Sanguinetti
|
Maria Simi
|
Hiroshi Kanayama
|
Valeria de Paiva
|
Kira Droganova
|
Héctor Martínez Alonso
|
Çağrı Çöltekin
|
Umut Sulubacak
|
Hans Uszkoreit
|
Vivien Macketanz
|
Aljoscha Burchardt
|
Kim Harris
|
Katrin Marheinecke
|
Georg Rehm
|
Tolga Kayadelen
|
Mohammed Attia
|
Ali Elkahky
|
Zhuoran Yu
|
Emily Pitler
|
Saran Lertpradit
|
Michael Mandl
|
Jesse Kirchner
|
Hector Fernandez Alcalde
|
Jana Strnadová
|
Esha Banerjee
|
Ruli Manurung
|
Antonio Stella
|
Atsuko Shimada
|
Sookyoung Kwak
|
Gustavo Mendonça
|
Tatiana Lando
|
Rattima Nitisaroj
|
Josie Li
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.
2016
pdf
bib
Non-projectivity and valency
Zdenka Uresova
|
Eva Fucikova
|
Jan Hajic
Proceedings of the Workshop on Discontinuous Structures in Natural Language Processing
pdf
bib
Inherently Pronominal Verbs in Czech: Description and Conversion Based on Treebank Annotation
Zdeňka Urešová
|
Eduard Bejček
|
Jan Hajič
Proceedings of the 12th Workshop on Multiword Expressions
pdf
bib
abs
Enriching a Valency Lexicon by Deverbative Nouns
Eva Fučíková
|
Jan Hajič
|
Zdeňka Urešová
Proceedings of the Workshop on Grammar and Lexicon: interactions and interfaces (GramLex)
We present an attempt to automatically identify Czech deverbative nouns using several methods that use large corpora as well as existing lexical resources. The motivation for the task is to extend a verbal valency (i.e., predicate-argument) lexicon by adding nouns that share the valency properties with the base verb, assuming their properties can be derived (even if not trivially) from the underlying verb by deterministic grammatical rules. At the same time, even in inflective languages, not all deverbatives are simply created from their underlying base verb by regular lexical derivation processes. We have thus developed hybrid techniques that use both large parallel corpora and several standard lexical resources. Thanks to the use of parallel corpora, the resulting sets contain also synonyms, which the lexical derivation rules cannot get. For evaluation, we have manually created a small, 100-verb gold data since no such dataset was initially available for Czech.
pdf
bib
abs
Joint search in a bilingual valency lexicon and an annotated corpus
Eva Fučíková
|
Jan Hajič
|
Zdeňka Urešová
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations
In this paper and the associated system demo, we present an advanced search system that allows to perform a joint search over a (bilingual) valency lexicon and a correspondingly annotated linked parallel corpus. This search tool has been developed on the basis of the Prague Czech-English Dependency Treebank, but its ideas are applicable in principle to any bilingual parallel corpus that is annotated for dependencies and valency (i.e., predicate-argument structure), and where verbs are linked to appropriate entries in an associated valency lexicon. Our online search tool consolidates more search interfaces into one, providing expanded structured search capability and a more efficient advanced way to search, allowing users to search for verb pairs, verbal argument pairs, their surface realization as recorded in the lexicon, or for their surface form actually appearing in the linked parallel corpus. The search system is currently under development, and is replacing our current search tool available at
http://lindat.mff.cuni.cz/services/CzEngVallex, which could search the lexicon but the queries cannot take advantage of the underlying corpus nor use the additional surface form information from the lexicon(s). The system is available as open source.
pdf
bib
abs
Czech Legal Text Treebank 1.0
Vincent Kríž
|
Barbora Hladká
|
Zdeňka Urešová
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
We introduce a new member of the family of Prague dependency treebanks. The Czech Legal Text Treebank 1.0 is a morphologically and syntactically annotated corpus of 1,128 sentences. The treebank contains texts from the legal domain, namely the documents from the Collection of Laws of the Czech Republic. Legal texts differ from other domains in several language phenomena influenced by rather high frequency of very long sentences. A manual annotation of such sentences presents a new challenge. We describe a strategy and tools for this task. The resulting treebank can be explored in various ways. It can be downloaded from the LINDAT/CLARIN repository and viewed locally using the TrEd editor or it can be accessed on-line using the KonText and TreeQuery tools.
pdf
bib
abs
Towards Comparability of Linguistic Graph Banks for Semantic Parsing
Stephan Oepen
|
Marco Kuhlmann
|
Yusuke Miyao
|
Daniel Zeman
|
Silvie Cinková
|
Dan Flickinger
|
Jan Hajič
|
Angelina Ivanova
|
Zdeňka Urešová
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
We announce a new language resource for research on semantic parsing, a large, carefully curated collection of semantic dependency graphs representing multiple linguistic traditions. This resource is called SDP~2016 and provides an update and extension to previous versions used as Semantic Dependency Parsing target representations in the 2014 and 2015 Semantic Evaluation Exercises. For a common core of English text, this third edition comprises semantic dependency graphs from four distinct frameworks, packaged in a unified abstract format and aligned at the sentence and token levels. SDP 2016 is the first general release of this resource and available for licensing from the Linguistic Data Consortium in May 2016. The data is accompanied by an open-source SDP utility toolkit and system results from previous contrastive parsing evaluations against these target representations.
2015
pdf
bib
Bilingual English-Czech Valency Lexicon Linked to a Parallel Corpus
Zdeňka Urešová
|
Ondřej Dušek
|
Eva Fučíková
|
Jan Hajič
|
Jana Šindlerová
Proceedings of The 9th Linguistic Annotation Workshop
pdf
bib
Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation
Ondřej Dušek
|
Eva Fučíková
|
Jan Hajič
|
Martin Popel
|
Jana Šindlerová
|
Zdeňka Urešová
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)
pdf
bib
Zero Alignment of Verb Arguments in a Parallel Treebank
Jana Šindlerová
|
Eva Fučíková
|
Zdeňka Urešová
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)
pdf
bib
SemEval 2015 Task 18: Broad-Coverage Semantic Dependency Parsing
Stephan Oepen
|
Marco Kuhlmann
|
Yusuke Miyao
|
Daniel Zeman
|
Silvie Cinková
|
Dan Flickinger
|
Jan Hajič
|
Zdeňka Urešová
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
2014
pdf
bib
abs
Resources in Conflict: A Bilingual Valency Lexicon vs. a Bilingual Treebank vs. a Linguistic Theory
Jana Šindlerová
|
Zdeňka Urešová
|
Eva Fucikova
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this paper, we would like to exemplify how a syntactically annotated bilingual treebank can help us in exploring and revising a developed linguistic theory. On the material of the Prague Czech-English Dependency Treebank we observe sentences in which an Addressee argument in one language is linked translationally to a Patient argument in the other one, and make generalizations about the theoretical grounds of the argument non-correspondences and its relations to the valency theory beyond the annotation practice. Exploring verbs of three semantic classes (Judgement verbs, Teaching verbs and Attempt Suasion verbs) we claim that the Functional Generative Description argument labelling is highly dependent on the morphosyntactic realization of the individual participants, which then results in valency frame differences. Nevertheless, most of the differences can be overcome without substantial changes to the linguistic theory itself.
pdf
bib
abs
Not an Interlingua, But Close: Comparison of English AMRs to Chinese and Czech
Nianwen Xue
|
Ondřej Bojar
|
Jan Hajič
|
Martha Palmer
|
Zdeňka Urešová
|
Xiuhong Zhang
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Abstract Meaning Representations (AMRs) are rooted, directional and labeled graphs that abstract away from morpho-syntactic idiosyncrasies such as word category (verbs and nouns), word order, and function words (determiners, some prepositions). Because these syntactic idiosyncrasies account for many of the cross-lingual differences, it would be interesting to see if this representation can serve, e.g., as a useful, minimally divergent transfer layer in machine translation. To answer this question, we have translated 100 English sentences that have existing AMRs into Chinese and Czech to create AMRs for them. A cross-linguistic comparison of English to Chinese and Czech AMRs reveals both cases where the AMRs for the language pairs align well structurally and cases of linguistic divergence. We found that the level of compatibility of AMR between English and Chinese is higher than between English and Czech. We believe this kind of comparison is beneficial to further refining the annotation standards for each of the three languages and will lead to more compatible annotation guidelines between the languages.
pdf
bib
abs
Multilingual Test Sets for Machine Translation of Search Queries for Cross-Lingual Information Retrieval in the Medical Domain
Zdeňka Urešová
|
Jan Hajič
|
Pavel Pecina
|
Ondřej Dušek
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper presents development and test sets for machine translation of search queries in cross-lingual information retrieval in the medical domain. The data consists of the total of 1,508 real user queries in English translated to Czech, German, and French. We describe the translation and review process involving medical professionals and present a baseline experiment where our data sets are used for tuning and evaluation of a machine translation system.
pdf
bib
Verbal Valency Frame Detection and Selection in Czech and English
Ondřej Dušek
|
Jan Hajič
|
Zdeňka Urešová
Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation
pdf
bib
Machine Translation of Medical Texts in the Khresmoi Project
Ondřej Dušek
|
Jan Hajič
|
Jaroslava Hlaváčová
|
Michal Novák
|
Pavel Pecina
|
Rudolf Rosa
|
Aleš Tamchyna
|
Zdeňka Urešová
|
Daniel Zeman
Proceedings of the Ninth Workshop on Statistical Machine Translation
pdf
bib
Comparing Czech and English AMRs
Jan Hajič
|
Ondřej Bojar
|
Zdeňka Urešová
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing
2013
pdf
bib
An Analysis of Annotation of Verb-Noun Idiomatic Combinations in a Parallel Dependency Corpus
Zdenka Uresova
|
Jan Hajic
|
Eva Fucikova
|
Jana Sindlerova
Proceedings of the 9th Workshop on Multiword Expressions
2012
pdf
bib
abs
Announcing Prague Czech-English Dependency Treebank 2.0
Jan Hajič
|
Eva Hajičová
|
Jarmila Panevová
|
Petr Sgall
|
Ondřej Bojar
|
Silvie Cinková
|
Eva Fučíková
|
Marie Mikulová
|
Petr Pajas
|
Jan Popelka
|
Jiří Semecký
|
Jana Šindlerová
|
Jan Štěpánek
|
Josef Toman
|
Zdeňka Urešová
|
Zdeněk Žabokrtský
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives a high level overview of the underlying linguistic theory (the so-called tectogrammatical annotation) with some details of the most important features like valency annotation, ellipsis reconstruction or coreference.
2009
pdf
bib
Syntactic annotation of spoken utterances: A case study on the Czech Academic Corpus
Barbora Hladká
|
Zdeňka Urešová
Proceedings of the Third Linguistic Annotation Workshop (LAW III)
2002
pdf
bib
The Theory of Control Applied to the Prague Dependency Treebank (PDT)
Jarmila Panevová
|
Veronika Řezníčková
|
Zdeňka Urešová
Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+6)