Marie Mikulová


2020

pdf bib
Prague Dependency Treebank - Consolidated 1.0
Jan Hajič | Eduard Bejček | Jaroslava Hlavacova | Marie Mikulová | Milan Straka | Jan Štěpánek | Barbora Štěpánková
Proceedings of the 12th Language Resources and Evaluation Conference

We present a richly annotated and genre-diversified language resource, the Prague Dependency Treebank-Consolidated 1.0 (PDT-C 1.0), the purpose of which is - as it always been the case for the family of the Prague Dependency Treebanks - to serve both as a training data for various types of NLP tasks as well as for linguistically-oriented research. PDT-C 1.0 contains four different datasets of Czech, uniformly annotated using the standard PDT scheme (albeit not everything is annotated manually, as we describe in detail here). The texts come from different sources: daily newspaper articles, Czech translation of the Wall Street Journal, transcribed dialogs and a small amount of user-generated, short, often non-standard language segments typed into a web translator. Altogether, the treebank contains around 180,000 sentences with their morphological, surface and deep syntactic annotation. The diversity of the texts and annotations should serve well the NLP applications as well as it is an invaluable resource for linguistic research, including comparative studies regarding texts of different genres. The corpus is publicly and freely available.

2019

pdf bib
Delimiting Adverbial Meanings. A corpus-based comparative study on Czech spatial prepositions and their English equivalents
Marie Mikulová | Veronika Kolářová | Jarmila Panevová | Eva Hajičová
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)

2018

pdf bib
ForFun 1.0: Prague Database of Forms and Functions – An Invaluable Resource for Linguistic Research
Marie Mikulová | Eduard Bejček
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
The Relation of Form and Function in Linguistic Theory and in a Multilayer Treebank
Eduard Bejček | Eva Hajičová | Marie Mikulová | Jarmila Panevová
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories

2016

pdf bib
Coreference in Prague Czech-English Dependency Treebank
Anna Nedoluzhko | Michal Novák | Silvie Cinková | Marie Mikulová | Jiří Mírovský
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present coreference annotation on parallel Czech-English texts of the Prague Czech-English Dependency Treebank (PCEDT). The paper describes innovations made to PCEDT 2.0 concerning coreference, as well as coreference information already present there. We characterize the coreference annotation scheme, give the statistics and compare our annotation with the coreference annotation in Ontonotes and Prague Dependency Treebank for Czech. We also present the experiments made using this corpus to improve the alignment of coreferential expressions, which helps us to collect better statistics of correspondences between types of coreferential relations in Czech and English. The corpus released as PCEDT 2.0 Coref is publicly available.

2015

pdf bib
Reconstructions of Deletions in a Dependency-based Description of Czech: Selected Issues
Eva Hajičová | Marie Mikulová | Jarmila Panevová
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

2014

pdf bib
Semantic Representation of Ellipsis in the Prague Dependency Treebanks
Marie Mikulová
Proceedings of the 26th Conference on Computational Linguistics and Speech Processing (ROCLING 2014)

2012

pdf bib
Announcing Prague Czech-English Dependency Treebank 2.0
Jan Hajič | Eva Hajičová | Jarmila Panevová | Petr Sgall | Ondřej Bojar | Silvie Cinková | Eva Fučíková | Marie Mikulová | Petr Pajas | Jan Popelka | Jiří Semecký | Jana Šindlerová | Jan Štěpánek | Josef Toman | Zdeňka Urešová | Zdeněk Žabokrtský
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives a high level overview of the underlying linguistic theory (the so-called tectogrammatical annotation) with some details of the most important features like valency annotation, ellipsis reconstruction or coreference.

2010

pdf bib
Ways of Evaluation of the Annotators in Building the Prague Czech-English Dependency Treebank
Marie Mikulová | Jan Štěpánek
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present several ways to measure and evaluate the annotation and annotators, proposed and used during the building of the Czech part of the Prague Czech-English Dependency Treebank. At first, the basic principles of the treebank annotation project are introduced (division to three layers: morphological, analytical and tectogrammatical). The main part of the paper describes in detail one of the important phases of the annotation process: three ways of evaluation of the annotators - inter-annotator agreement, error rate and performance. The measuring of the inter-annotator agreement is complicated by the fact that the data contain added and deleted nodes, making the alignment between annotations non-trivial. The error rate is measured by a set of automatic checking procedures that guard the validity of some invariants in the data. The performance of the annotators is measured by a booking web application. All three measures are later compared and related to each other.