Anna Nedoluzhko


2021

pdf bib
Do UD Trees Match Mention Spans in Coreference Annotations?
Martin Popel | Zdeněk Žabokrtský | Anna Nedoluzhko | Michal Novák | Daniel Zeman
Findings of the Association for Computational Linguistics: EMNLP 2021

One can find dozens of data resources for various languages in which coreference - a relation between two or more expressions that refer to the same real-world entity - is manually annotated. One could also assume that such expressions usually constitute syntactically meaningful units; however, mention spans have been annotated simply by delimiting token intervals in most coreference projects, i.e., independently of any syntactic representation. We argue that it could be advantageous to make syntactic and coreference annotations convergent in the long term. We present a pilot empirical study focused on matches and mismatches between hand-annotated linear mention spans and automatically parsed syntactic trees that follow Universal Dependencies conventions. The study covers 9 datasets for 8 different languages.

pdf bib
Is one head enough? Mention heads in coreference annotations compared with UD-style heads
Anna Nedoluzhko | Michal Novák | Martin Popel | Zdeněk Žabokrtský | Daniel Zeman
Proceedings of the Sixth International Conference on Dependency Linguistics (Depling, SyntaxFest 2021)

2018

pdf bib
PAWS: A Multi-lingual Parallel Treebank with Anaphoric Relations
Anna Nedoluzhko | Michal Novák | Maciej Ogrodniczuk
Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference

We present PAWS, a multi-lingual parallel treebank with coreference annotation. It consists of English texts from the Wall Street Journal translated into Czech, Russian and Polish. In addition, the texts are syntactically parsed and word-aligned. PAWS is based on PCEDT 2.0 and continues the tradition of multilingual treebanks with coreference annotation. The paper focuses on the coreference annotation in PAWS and its language-specific differences. PAWS offers linguistic material that can be further leveraged in cross-lingual studies, especially on coreference.

2017

pdf bib
Projection-based Coreference Resolution Using Deep Syntax
Michal Novák | Anna Nedoluzhko | Zdeněk Žabokrtský
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)

The paper describes the system for coreference resolution in German and Russian, trained exclusively on coreference relations project ed through a parallel corpus from English. The resolver operates on the level of deep syntax and makes use of multiple specialized models. It achieves 32 and 22 points in terms of CoNLL score for Russian and German, respectively. Analysis of the evaluation results show that the resolver for Russian is able to preserve 66% of the English resolver’s quality in terms of CoNLL score. The system was submitted to the Closed track of the CORBON 2017 Shared task.

pdf bib
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Martin Popel | Milan Straka | Jan Hajič | Joakim Nivre | Filip Ginter | Juhani Luotolahti | Sampo Pyysalo | Slav Petrov | Martin Potthast | Francis Tyers | Elena Badmaeva | Memduh Gokirmak | Anna Nedoluzhko | Silvie Cinková | Jan Hajič jr. | Jaroslava Hlaváčová | Václava Kettnerová | Zdeňka Urešová | Jenna Kanerva | Stina Ojala | Anna Missilä | Christopher D. Manning | Sebastian Schuster | Siva Reddy | Dima Taji | Nizar Habash | Herman Leung | Marie-Catherine de Marneffe | Manuela Sanguinetti | Maria Simi | Hiroshi Kanayama | Valeria de Paiva | Kira Droganova | Héctor Martínez Alonso | Çağrı Çöltekin | Umut Sulubacak | Hans Uszkoreit | Vivien Macketanz | Aljoscha Burchardt | Kim Harris | Katrin Marheinecke | Georg Rehm | Tolga Kayadelen | Mohammed Attia | Ali Elkahky | Zhuoran Yu | Emily Pitler | Saran Lertpradit | Michael Mandl | Jesse Kirchner | Hector Fernandez Alcalde | Jana Strnadová | Esha Banerjee | Ruli Manurung | Antonio Stella | Atsuko Shimada | Sookyoung Kwak | Gustavo Mendonça | Tatiana Lando | Rattima Nitisaroj | Josie Li
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

2016

pdf bib
Abstract Coreference in a Multilingual Perspective: a View on Czech and German
Anna Nedoluzhko | Ekaterina Lapshinova-Koltunski
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)

pdf bib
Bridging Corpus for Russian in comparison with Czech
Anna Roitberg | Anna Nedoluzhko
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)

pdf bib
A new look at possessive reflexivization: A comparative study between Czech and Russian
Anna Nedoluzhko
Proceedings of the Workshop on Grammar and Lexicon: interactions and interfaces (GramLex)

The paper presents a contrastive description of reflexive possessive pronouns “svůj” in Czech and “svoj” in Russian. The research concerns syntactic, semantic and pragmatic aspects. With our analysis, we shed a new light on the already investigated issue, which comes from a detailed comparison of the phenomenon of possessive reflexivization in two typologically and genetically similar languages. We show that whereas in Czech, the possessive reflexivization is mostly limited to syntactic functions and does not go beyond the grammar, in Russian it gets additional semantic meanings and moves substan-tially towards the lexicon. The obtained knowledge allows us to explain heretofore unclear marginal uses of reflexives in each language.

pdf bib
Coreference in Prague Czech-English Dependency Treebank
Anna Nedoluzhko | Michal Novák | Silvie Cinková | Marie Mikulová | Jiří Mírovský
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present coreference annotation on parallel Czech-English texts of the Prague Czech-English Dependency Treebank (PCEDT). The paper describes innovations made to PCEDT 2.0 concerning coreference, as well as coreference information already present there. We characterize the coreference annotation scheme, give the statistics and compare our annotation with the coreference annotation in Ontonotes and Prague Dependency Treebank for Czech. We also present the experiments made using this corpus to improve the alignment of coreferential expressions, which helps us to collect better statistics of correspondences between types of coreferential relations in Czech and English. The corpus released as PCEDT 2.0 Coref is publicly available.

pdf bib
From Interoperable Annotations towards Interoperable Resources: A Multilingual Approach to the Analysis of Discourse
Ekaterina Lapshinova-Koltunski | Kerstin Anna Kunz | Anna Nedoluzhko
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In the present paper, we analyse variation of discourse phenomena in two typologically different languages, i.e. in German and Czech. The novelty of our approach lies in the nature of the resources we are using. Advantage is taken of existing resources, which are, however, annotated on the basis of two different frameworks. We use an interoperable scheme unifying discourse phenomena in both frameworks into more abstract categories and considering only those phenomena that have a direct match in German and Czech. The discourse properties we focus on are relations of identity, semantic similarity, ellipsis and discourse relations. Our study shows that the application of interoperable schemes allows an exploitation of discourse-related phenomena analysed in different projects and on the basis of different frameworks. As corpus compilation and annotation is a time-consuming task, positive results of this experiment open up new paths for contrastive linguistics, translation studies and NLP, including machine translation.

2015

pdf bib
Across Languages and Genres: Creating a Universal Annotation Scheme for Textual Relations
Ekaterina Lapshinova-Koltunski | Anna Nedoluzhko | Kerstin Anna Kunz
Proceedings of The 9th Linguistic Annotation Workshop

2013

pdf bib
Generic noun phrases and annotation of coreference and bridging relations in the Prague Dependency Treebank
Anna Nedoluzhko
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
Translation of “It” in a Deep Syntax Framework
Michal Novák | Anna Nedoluzhko | Zdeněk Žabokrtský
Proceedings of the Workshop on Discourse in Machine Translation

pdf bib
Annotators’ Certainty and Disagreements in Coreference and Bridging Annotation in Prague Dependency Treebank
Anna Nedoluzhko | Jiří Mírovský
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

pdf bib
How Dependency Trees and Tectogrammatics Help Annotating Coreference and Bridging Relations in Prague Dependency Treebank
Anna Nedoluzhko | Jiří Mírovský
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

pdf bib
Introducing the Prague Discourse Treebank 1.0
Lucie Poláková | Jiří Mírovský | Anna Nedoluzhko | Pavlína Jínová | Šárka Zikánová | Eva Hajičová
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Two Case Studies on Translating Pronouns in a Deep Syntax Framework
Michal Novák | Zdeněk Žabokrtský | Anna Nedoluzhko
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2010

pdf bib
Annotation Tool for Extended Textual Coreference and Bridging Anaphora
Jiří Mírovský | Petr Pajas | Anna Nedoluzhko
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present an annotation tool for the extended textual coreference and the bridging anaphora in the Prague Dependency Treebank 2.0 (PDT 2.0). After we very briefly describe the annotation scheme, we focus on details of the annotation process from the technical point of view. We present the way of helping the annotators by several useful features implemented in the annotation tool, such as a possibility to combine surface and deep syntactic representation of sentences during the annotation, an automatic maintaining of the coreferential chain, underlining candidates for antecedents, etc. For studying differences among parallel annotations, the tool offers a simultaneous depicting of several annotations of the same data. The annotation tool can be used for other corpora too, as long as they have been transformed to the PML format. We present modifications of the tool for working with the coreference relations on other layers of language description, namely on the analytical layer and the morphological layer of PDT.

2009

pdf bib
The Coding Scheme for Annotating Extended Nominal Coreference and Bridging Anaphora in the Prague Dependency Treebank
Anna Nedoluzhko | Jiří Mírovský | Petr Pajas
Proceedings of the Third Linguistic Annotation Workshop (LAW III)