Eva Hajicova

Also published as: E. Hajicova, Eva Hajicová, Eva Hajičová


Developing a Rhetorical Structure Theory Treebank for Czech
Lucie Polakova | Jiří Mírovský | Šárka Zikánová | Eva Hajicova
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We introduce the first version of the Czech RST Discourse Treebank, a collection of Czech journalistic texts manually annotated using the Rhetorical Structure Theory (RST), a global coherence model proposed by Mann and Thompson (1988). Each document in the corpus is represented as a single tree-like structure, where discourse units are interconnected through hierarchical rhetorical relations and their relative importance for the main purpose of a text is modeled by the nuclearity principle. The treebank is freely available in the LINDAT/CLARIAH-CZ repository under the Creative Commons license; for some documents, it includes two gold annotations representing divergent yet relevant interpretations. The paper outlines the annotation process, provides corpus statistics and evaluation, and discusses the issue of consistency associated with the global level of textual interpretation. In general, good agreement on the structure and labeling could be achieved on the lowest, local tree level and on the identification of the most central (nuclear) elementary discourse units. Disagreements mostly concerned segmentation and, in the structure, differences in the stepwise process of linking the largest text blocks. The project contributes to the advancement of RST research and its application to real-world text analysis challenges.


Advantages of a Complex Multilayer Annotation Scheme: The Case of the Prague Dependency Treebank
Eva Hajicova | Marie Mikulová | Barbora Štěpánková | Jiří Mírovský
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022

Recently, many corpora have been developed that contain multiple annotations of various linguistic phenomena, from morphological categories of words through the syntactic structure of sentences to discourse and coreference relations in texts. Discussions are ongoing on an appropriate annotation scheme for a large amount of diverse information. In our contribution we express our conviction that a multilayer annotation scheme offers to view the language system in its complexity and in the interaction of individual phenomena and that there are at least two aspects that support such a scheme: (i) A multilayer annotation scheme makes it possible to use the annotation of one layer to design the annotation of another layer(s) both conceptually and in a form of a pre-annotation procedure or annotation checking rules. (ii) A multilayer annotation scheme presents a reliable ground for corpus studies based on features across the layers. These aspects are demonstrated on the case of the Prague Dependency Treebank. Its multilayer annotation scheme withstood the test of time and serves well also for complex textual annotations, in which earlier morpho-syntactic annotations are advantageously used. In addition to a reference to the previous projects that utilise its annotation scheme, we present several current investigations.


pdf bib
SynSemClass Linked Lexicon: Mapping Synonymy between Languages
Zdenka Uresova | Eva Fucikova | Eva Hajicova | Jan Hajic
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

This paper reports on an extended version of a synonym verb class lexicon, newly called SynSemClass (formerly CzEngClass). This lexicon stores cross-lingual semantically similar verb senses in synonym classes extracted from a richly annotated parallel corpus, the Prague Czech-English Dependency Treebank. When building the lexicon, we make use of predicate-argument relations (valency) and link them to semantic roles; in addition, each entry is linked to several external lexicons of more or less “semantic” nature, namely FrameNet, WordNet, VerbNet, OntoNotes and PropBank, and Czech VALLEX. The aim is to provide a linguistic resource that can be used to compare semantic roles and their syntactic properties and features across languages within and across synonym groups (classes, or ’synsets’), as well as gold standard data for automatic NLP experiments with such synonyms, such as synonym discovery, feature mapping, etc. However, perhaps the most important goal is to eventually build an event type ontology that can be referenced and used as a human-readable and human-understandable “database” for all types of events, processes and states. While the current paper describes primarily the content of the lexicon, we are also presenting a preliminary design of a format compatible with Linked Data, on which we are hoping to get feedback during discussions at the workshop. Once the resource (in whichever form) is applied to corpus annotation, deep analysis will be possible using such combined resources as training data.


A Plea for Information Structure as a Part of Meaning Representation
Eva Hajicova
Proceedings of the First International Workshop on Designing Meaning Representations

The view that the representation of information structure (IS) should be a part of (any type of) representation of meaning is based on the fact that IS is a semantically relevant phenomenon. In the contribution, three arguments supporting this view are briefly summarized, namely, the relation of IS to the interpretation of negation and presupposition, the relevance of IS to the understanding of discourse connectivity and for the establishment and interpretation of coreference relations. Afterwards, possible integration of the description of the main ingredient of IS into a meaning representation is illustrated.

Delimiting Adverbial Meanings. A corpus-based comparative study on Czech spatial prepositions and their English equivalents
Marie Mikulová | Veronika Kolářová | Jarmila Panevová | Eva Hajičová
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)

Parallel Dependency Treebank Annotated with Interlinked Verbal Synonym Classes and Roles
Zdeňka Urešová | Eva Fučíková | Eva Hajičová | Jan Hajič
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

Ordering of Adverbials of Time and Place in Grammars and in an Annotated English-Czech Parallel Corpus
Eva Hajičová | Jiří Mírovský | Kateřina Rysová
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)


Tools for Building an Interlinked Synonym Lexicon Network
Zdeňka Urešová | Eva Fučíková | Eva Hajičová | Jan Hajič
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Creating a Verb Synonym Lexicon Based on a Parallel Corpus
Zdeňka Urešová | Eva Fučíková | Eva Hajičová | Jan Hajič
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Discourse Coherence Through the Lens of an Annotated Text Corpus: A Case Study
Eva Hajičová | Jiří Mírovský
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Synonymy in Bilingual Context: The CzEngClass Lexicon
Zdeňka Urešová | Eva Fučíková | Eva Hajičová | Jan Hajič
Proceedings of the 27th International Conference on Computational Linguistics

This paper describes CzEngClass, a bilingual lexical resource being built to investigate verbal synonymy in bilingual context and to relate semantic roles common to one synonym class to verb arguments (verb valency). In addition, the resource is linked to existing resources with the same of a similar aim: English and Czech WordNet, FrameNet, PropBank, VerbNet (SemLink), and valency lexicons for Czech and English (PDT-Vallex, Vallex, and EngVallex). There are several goals of this work and resource: (a) to provide gold standard data for automatic experiments in the future (such as automatic discovery of synonym classes, word sense disambiguation, assignment of classes to occurrences of verbs in text, coreferential linking of verb and event arguments in text, etc.), (b) to build a core (bilingual) lexicon linked to existing resources, for comparative studies and possibly for training automatic tools, and (c) to enrich the annotation of a parallel treebank, the Prague Czech English Dependency Treebank, which so far contained valency annotation but has not linked synonymous senses of verbs together. The method used for extracting the synonym classes is a semi-automatic process with a substantial amount of manual work during filtering, role assignment to classes and individual Class members’ arguments, and linking to the external lexical resources. We present the first version with 200 classes (about 1800 verbs) and evaluate interannotator agreement using several metrics.


pdf bib
Syntax-Semantics Interface: A Plea for a Deep Dependency Sentence Structure
Eva Hajičová
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

The Relation of Form and Function in Linguistic Theory and in a Multilayer Treebank
Eduard Bejček | Eva Hajičová | Marie Mikulová | Jarmila Panevová
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories


pdf bib
Proceedings of the Workshop on Grammar and Lexicon: interactions and interfaces (GramLex)
Eva Hajičová | Igor Boguslavsky
Proceedings of the Workshop on Grammar and Lexicon: interactions and interfaces (GramLex)


Obituaries: Jane J. Robinson
Barbara J. Grosz | Eva Hajicova | Aravind Joshi
Computational Linguistics, Volume 41, Issue 4 - December 2015

pdf bib
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)
Joakim Nivre | Eva Hajičová
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

Reconstructions of Deletions in a Dependency-based Description of Czech: Selected Issues
Eva Hajičová | Marie Mikulová | Jarmila Panevová
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)


Three dimensions of the so-called “interoperability” of annotation schemes”
Eva Hajičová
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

“Interoperability” of annotation schemes is one of the key words in the discussions about annotation of corpora. In the present contribution, we propose to look at the so-called interoperability from (at least) three angles, namely (i) as a relation (and possible interaction or cooperation) of different annotation schemes for different layers or phenomena of a single language, (ii) the possibility to annotate different languages by a single (modified or not) annotation scheme, and (iii) the relation between different annotation schemes for a single language, or for a single phenomenon or layer of the same language. The pros and cons of each of these aspects are discussed as well as their contribution to linguistic studies and natural language processing. It is stressed that a communication and collaboration between different annotation schemes requires an explicit specification and consistency of each of the schemes.


pdf bib
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)
Eva Hajičová | Kim Gerdes | Leo Wanner
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

(Pre-)Annotation of Topic-Focus Articulation in Prague Czech-English Dependency Treebank
Jiří Mírovský | Kateřina Rysová | Magdaléna Rysová | Eva Hajičová
Proceedings of the Sixth International Joint Conference on Natural Language Processing

Introducing the Prague Discourse Treebank 1.0
Lucie Poláková | Jiří Mírovský | Anna Nedoluzhko | Pavlína Jínová | Šárka Zikánová | Eva Hajičová
Proceedings of the Sixth International Joint Conference on Natural Language Processing


pdf bib
Proceedings of the Workshop on Advances in Discourse Analysis and its Computational Aspects
Eva Hajičová | Lucie Poláková | Jiří Mírovský
Proceedings of the Workshop on Advances in Discourse Analysis and its Computational Aspects

Announcing Prague Czech-English Dependency Treebank 2.0
Jan Hajič | Eva Hajičová | Jarmila Panevová | Petr Sgall | Ondřej Bojar | Silvie Cinková | Eva Fučíková | Marie Mikulová | Petr Pajas | Jan Popelka | Jiří Semecký | Jana Šindlerová | Jan Štěpánek | Josef Toman | Zdeňka Urešová | Zdeněk Žabokrtský
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives a high level overview of the underlying linguistic theory (the so-called tectogrammatical annotation) with some details of the most important features like valency annotation, ellipsis reconstruction or coreference.


Resource and Service Centres as the Backbone for a Sustainable Service Infrastructure
Peter Wittenburg | Nuria Bel | Lars Borin | Gerhard Budin | Nicoletta Calzolari | Eva Hajicova | Kimmo Koskenniemi | Lothar Lemnitzer | Bente Maegaard | Maciej Piasecki | Jean-Marie Pierrel | Stelios Piperidis | Inguna Skadina | Dan Tufis | Remco van Veenendaal | Tamas Váradi | Martin Wynne
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Currently, research infrastructures are being designed and established in many disciplines since they all suffer from an enormous fragmentation of their resources and tools. In the domain of language resources and tools the CLARIN initiative has been funded since 2008 to overcome many of the integration and interoperability hurdles. CLARIN can build on knowledge and work from many projects that were carried out during the last years and wants to build stable and robust services that can be used by researchers. Here service centres will play an important role that have the potential of being persistent and that adhere to criteria as they have been established by CLARIN. In the last year of the so-called preparatory phase these centres are currently developing four use cases that can demonstrate how the various pillars CLARIN has been working on can be integrated. All four use cases fulfil the criteria of being cross-national.


From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank
Lucie Mladová | Šárka Zikánová | Eva Hajičová
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The present paper reports on a preparatory research for building a language corpus annotation scenario capturing the discourse relations in Czech. We primarily focus on the description of the syntactically motivated relations in discourse, basing our findings on the theoretical background of the Prague Dependency Treebank 2.0 and the Penn Discourse Treebank 2. Our aim is to revisit the present-day syntactico-semantic (tectogrammatical) annotation in the Prague Dependency Treebank, extend it for the purposes of a sentence-boundary-crossing representation and eventually to design a new, discourse level of annotation. In this paper, we propose a feasible process of such a transfer, comparing the possibilities the Praguian dependency-based approach offers with the Penn discourse annotation based primarily on the analysis and classification of discourse connectives.


Discourse Annotation Working Group Report
Manfred Stede | Janyce Wiebe | Eva Hajičová | Brian Reese | Simone Teufel | Bonnie Webber | Theresa Wilson
Proceedings of the Linguistic Annotation Workshop


Corpus Annotation as a Test of a Linguistic Theory
Eva Hajičová | Petr Sgall
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In the present contribution we claim that corpus annotation serves, among other things, as an invaluable test for linguistic theories standing behind the annotation schemes, and as such represents an irreplaceable resource of linguistic information for the build-up of grammars. To support this claim we present four linguistic phenomena for the study and relevant description of which in grammar a deep layer of corpus annotation as introduced in the Prague Dependency Treebank has brought important observations, namely the information structure of the sentence, condition of projectivity and word order, types of dependency relations and textual coreference.

pdf bib
ACL Lifetime Achievement Award: Old Linguists Never Die, They Only Get Obligatorily Deleted
Eva Hajicova
Computational Linguistics, Volume 32, Number 4, December 2006


Deep Syntactic Annotation: Tectogrammatical Representation and Beyond
Petr Sgall | Jarmila Panevová | Eva Hajičová
Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004

Condition of Projectivity in the Underlying Dependency Structures
Katerina Veselá | Jiri Havelka | Eva Hajicová
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

Annotators’ Agreement: The Case of Topic-Focus Articulation
Kateřina Veselá | Jiří Havelka | Eva Hajičová
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

The annotation of the Prague Dependency Treebank (PDT) is conceived of as a multilayered scenario that comprises also dependency representations (tectogrammatical tree structures, TGTS's) of the underlying structure of the sentences. TGTS's capture three basic aspects of the underlying structure of sentences: (a) the dependency tree structure, (b) the kinds of dependency syntactic relations, and (c) the basic characteristics of the topic-focus articulation (TFA). Since the PDT is a large collection and the annotations on the deepest layer are to a large extent performed by several human annotators (based on an automatic preprocessing module), it is more than necessary to observe the consistence of annotators and the agreement among them. In the present paper, we summarize the results of the evaluation of parallel annotations of several samples taken from PDT and the measures accepted to improve the consistency of annotations.


Argument/Valency Structure in PropBank, LCS Database and Prague Dependency Treebank: A Comparative Pilot Study
Eva Hajičová | Ivona Kučerová
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)


Topic-focus and Salience
Eva Hajicová | Petr Sgall
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics


Tagging of very large corpora: Topic-Focus Articulation
Eva Buranova | Eva Hajicova | Petr Sgall
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

Deletions and their reconstruction in tectogrammatical syntactic tagging of very large corpora
Eva Hajicová | Markéta Ceplová
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

Semantico-syntactic Tagging of Very Large Corpora: the Case of Restoration of Nodes on the Underlying Level
Eva Hajičová | Petr Sgall
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

Coreference in Annotating a Large Corpus
Eva Hajičová | Jarmila Panevová | Petr Sgall
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)


Movement rules revisited
Eva Hajicova
Processing of Dependency-Based Grammars


Identifying Topic and Focus by an Automatic Procedure
Eva Hajicova | Hana Skoumalova | Petr Sgall
Computational Linguistics, Volume 21, Number 1, March 1995


Identifying Topic and Focus by an Automatic Procedure
Eva Hajicova | Petr Sgall | Hana Skonmalovla
Sixth Conference of the European Chapter of the Association for Computational Linguistics


Stock of Shared Knowledge - A Tool for Solving Pronominal Anaphora
Eva Hajicova | Vladislav Kubon | Petr Kubon
COLING 1992 Volume 1: The 14th International Conference on Computational Linguistics

Derivation of Underlying Valency Frames From a Learner’s Dictionary
Alexandr Rosen | Eva Hajicova | Jan Hajic
COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics


pdf bib
Proceedings of the Second International Workshop on Parsing Technologies (IWPT ’91)
Masaru Tomita | Martin Kay | Robert Berwick | Eva Hajicova | Aravind Joshi | Ronald Kaplan | Makoto Nagao | Yorick Wilks
Proceedings of the Second International Workshop on Parsing Technologies

February 13-25, 1991


Hierarchy of Salience and Discourse Analysis and Production
Eva Hajicova | Petr Kubon | Vladlslav Kubon
COLING 1990 Volume 3: Papers presented to the 13th International Conference on Computational Linguistics


A Dependency-Based Parser for Topic and Focus
Eva Hajičová
Proceedings of the First International Workshop on Parsing Technologies


Reasons Why We Use Dependency Grammar
Eva Hajicova
Coling Budapest 1988 Volume 2: International Conference on Computational Linguistics


Fail-Soft (“Emergency”) Measures in a Production-Oriented MT System
Eva Hajicova | Zdenek Kirschner
Third Conference of the European Chapter of the Association for Computational Linguistics


Degrees of Understanding
Eva Hajicova | Petr Sgall
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics


Towards an Automatic Identification of Topic and Focus
Eva Hajicova | Petr Sgall
Second Conference of the European Chapter of the Association for Computational Linguistics


Inferencing on Linguistically Based Semantic Structures
Eva Hajičová | Milena Hnátková
10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics


Structure of Sentence and Inferencing in Question Answering
Eva Hajicova | Petr Sgall
First Conference of the European Chapter of the Association for Computational Linguistics


The Role of the Hierarchy of Activation in the Process of Natural Language Understanding
Eva Hajicova | Jarka Vrbova
Coling 1982: Proceedings of the Ninth International Conference on Computational Linguistics


Linguistic Meaning and Knowledge Representation in Automatic Understanding of Natural Language
Eva Hajicova | Petr Sgall
COLING 1980 Volume 1: The 8th International Conference on Computational Linguistics


On Semantics of Some Verbal Categories in English
Eva Hajicova
International Conference on Computational Linguistics COLING 1969: Preprint No. 62: Collection of Abstracts of Papers

pdf bib
SOME REMARKS ON J. L. MEY’s PAPER (Preprint No. 20)
P. Sgall | E. Hajicova
International Conference on Computational Linguistics COLING 1969