Eva Hajicova

Also published as: Eva Hajičová, E. Hajičová, Eva Hajicová

2025

pdf bib
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)
Eva Hajičová | Sylvain Kahane
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)

2024

pdf bib abs
Developing a Rhetorical Structure Theory Treebank for Czech
Lucie Poláková | Jiří Mírovský | Šárka Zikánová | Eva Hajičová
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We introduce the first version of the Czech RST Discourse Treebank, a collection of Czech journalistic texts manually annotated using the Rhetorical Structure Theory (RST), a global coherence model proposed by Mann and Thompson (1988). Each document in the corpus is represented as a single tree-like structure, where discourse units are interconnected through hierarchical rhetorical relations and their relative importance for the main purpose of a text is modeled by the nuclearity principle. The treebank is freely available in the LINDAT/CLARIAH-CZ repository under the Creative Commons license; for some documents, it includes two gold annotations representing divergent yet relevant interpretations. The paper outlines the annotation process, provides corpus statistics and evaluation, and discusses the issue of consistency associated with the global level of textual interpretation. In general, good agreement on the structure and labeling could be achieved on the lowest, local tree level and on the identification of the most central (nuclear) elementary discourse units. Disagreements mostly concerned segmentation and, in the structure, differences in the stepwise process of linking the largest text blocks. The project contributes to the advancement of RST research and its application to real-world text analysis challenges.

2022

pdf bib abs
Advantages of a Complex Multilayer Annotation Scheme: The Case of the Prague Dependency Treebank
Eva Hajičová | Marie Mikulová | Barbora Štěpánková | Jiří Mírovský
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022

Recently, many corpora have been developed that contain multiple annotations of various linguistic phenomena, from morphological categories of words through the syntactic structure of sentences to discourse and coreference relations in texts. Discussions are ongoing on an appropriate annotation scheme for a large amount of diverse information. In our contribution we express our conviction that a multilayer annotation scheme offers to view the language system in its complexity and in the interaction of individual phenomena and that there are at least two aspects that support such a scheme: (i) A multilayer annotation scheme makes it possible to use the annotation of one layer to design the annotation of another layer(s) both conceptually and in a form of a pre-annotation procedure or annotation checking rules. (ii) A multilayer annotation scheme presents a reliable ground for corpus studies based on features across the layers. These aspects are demonstrated on the case of the Prague Dependency Treebank. Its multilayer annotation scheme withstood the test of time and serves well also for complex textual annotations, in which earlier morpho-syntactic annotations are advantageously used. In addition to a reference to the previous projects that utilise its annotation scheme, we present several current investigations.

2020

pdf bib abs
SynSemClass Linked Lexicon: Mapping Synonymy between Languages
Zdeňka Urešová | Eva Fučíková | Eva Hajičová | Jan Hajič
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

This paper reports on an extended version of a synonym verb class lexicon, newly called SynSemClass (formerly CzEngClass). This lexicon stores cross-lingual semantically similar verb senses in synonym classes extracted from a richly annotated parallel corpus, the Prague Czech-English Dependency Treebank. When building the lexicon, we make use of predicate-argument relations (valency) and link them to semantic roles; in addition, each entry is linked to several external lexicons of more or less “semantic” nature, namely FrameNet, WordNet, VerbNet, OntoNotes and PropBank, and Czech VALLEX. The aim is to provide a linguistic resource that can be used to compare semantic roles and their syntactic properties and features across languages within and across synonym groups (classes, or ’synsets’), as well as gold standard data for automatic NLP experiments with such synonyms, such as synonym discovery, feature mapping, etc. However, perhaps the most important goal is to eventually build an event type ontology that can be referenced and used as a human-readable and human-understandable “database” for all types of events, processes and states. While the current paper describes primarily the content of the lexicon, we are also presenting a preliminary design of a format compatible with Linked Data, on which we are hoping to get feedback during discussions at the workshop. Once the resource (in whichever form) is applied to corpus annotation, deep analysis will be possible using such combined resources as training data.

2019

pdf bib abs
A Plea for Information Structure as a Part of Meaning Representation
Eva Hajičová
Proceedings of the First International Workshop on Designing Meaning Representations

The view that the representation of information structure (IS) should be a part of (any type of) representation of meaning is based on the fact that IS is a semantically relevant phenomenon. In the contribution, three arguments supporting this view are briefly summarized, namely, the relation of IS to the interpretation of negation and presupposition, the relevance of IS to the understanding of discourse connectivity and for the establishment and interpretation of coreference relations. Afterwards, possible integration of the description of the main ingredient of IS into a meaning representation is illustrated.

pdf bib
Delimiting Adverbial Meanings. A corpus-based comparative study on Czech spatial prepositions and their English equivalents
Marie Mikulová | Veronika Kolářová | Jarmila Panevová | Eva Hajičová
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)

pdf bib
Parallel Dependency Treebank Annotated with Interlinked Verbal Synonym Classes and Roles
Zdeňka Urešová | Eva Fučíková | Eva Hajičová | Jan Hajič
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

pdf bib
Ordering of Adverbials of Time and Place in Grammars and in an Annotated English-Czech Parallel Corpus
Eva Hajičová | Jiří Mírovský | Kateřina Rysová
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

2018

pdf bib abs
Synonymy in Bilingual Context: The CzEngClass Lexicon
Zdeňka Urešová | Eva Fučíková | Eva Hajičová | Jan Hajič
Proceedings of the 27th International Conference on Computational Linguistics

This paper describes CzEngClass, a bilingual lexical resource being built to investigate verbal synonymy in bilingual context and to relate semantic roles common to one synonym class to verb arguments (verb valency). In addition, the resource is linked to existing resources with the same of a similar aim: English and Czech WordNet, FrameNet, PropBank, VerbNet (SemLink), and valency lexicons for Czech and English (PDT-Vallex, Vallex, and EngVallex). There are several goals of this work and resource: (a) to provide gold standard data for automatic experiments in the future (such as automatic discovery of synonym classes, word sense disambiguation, assignment of classes to occurrences of verbs in text, coreferential linking of verb and event arguments in text, etc.), (b) to build a core (bilingual) lexicon linked to existing resources, for comparative studies and possibly for training automatic tools, and (c) to enrich the annotation of a parallel treebank, the Prague Czech English Dependency Treebank, which so far contained valency annotation but has not linked synonymous senses of verbs together. The method used for extracting the synonym classes is a semi-automatic process with a substantial amount of manual work during filtering, role assignment to classes and individual Class members’ arguments, and linking to the external lexical resources. We present the first version with 200 classes (about 1800 verbs) and evaluate interannotator agreement using several metrics.

pdf bib
Tools for Building an Interlinked Synonym Lexicon Network
Zdeňka Urešová | Eva Fučíková | Eva Hajičová | Jan Hajič
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Creating a Verb Synonym Lexicon Based on a Parallel Corpus
Zdeňka Urešová | Eva Fučíková | Eva Hajičová | Jan Hajič
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Discourse Coherence Through the Lens of an Annotated Text Corpus: A Case Study
Eva Hajičová | Jiří Mírovský
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

“Interoperability” of annotation schemes is one of the key words in the discussions about annotation of corpora. In the present contribution, we propose to look at the so-called interoperability from (at least) three angles, namely (i) as a relation (and possible interaction or cooperation) of different annotation schemes for different layers or phenomena of a single language, (ii) the possibility to annotate different languages by a single (modified or not) annotation scheme, and (iii) the relation between different annotation schemes for a single language, or for a single phenomenon or layer of the same language. The pros and cons of each of these aspects are discussed as well as their contribution to linguistic studies and natural language processing. It is stressed that a communication and collaboration between different annotation schemes requires an explicit specification and consistency of each of the schemes.

2013

pdf bib
(Pre-)Annotation of Topic-Focus Articulation in Prague Czech-English Dependency Treebank
Jiří Mírovský | Kateřina Rysová | Magdaléna Rysová | Eva Hajičová
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)
Eva Hajičová | Kim Gerdes | Leo Wanner
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

2012

We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives a high level overview of the underlying linguistic theory (the so-called tectogrammatical annotation) with some details of the most important features like valency annotation, ellipsis reconstruction or coreference.

pdf bib
Proceedings of the Workshop on Advances in Discourse Analysis and its Computational Aspects
Eva Hajičová | Lucie Poláková | Jiří Mírovský
Proceedings of the Workshop on Advances in Discourse Analysis and its Computational Aspects

2010

Currently, research infrastructures are being designed and established in many disciplines since they all suffer from an enormous fragmentation of their resources and tools. In the domain of language resources and tools the CLARIN initiative has been funded since 2008 to overcome many of the integration and interoperability hurdles. CLARIN can build on knowledge and work from many projects that were carried out during the last years and wants to build stable and robust services that can be used by researchers. Here service centres will play an important role that have the potential of being persistent and that adhere to criteria as they have been established by CLARIN. In the last year of the so-called preparatory phase these centres are currently developing four use cases that can demonstrate how the various pillars CLARIN has been working on can be integrated. All four use cases fulfil the criteria of being cross-national.

2008

pdf bib abs
From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank
Lucie Mladová | Šárka Zikánová | Eva Hajičová
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The present paper reports on a preparatory research for building a language corpus annotation scenario capturing the discourse relations in Czech. We primarily focus on the description of the syntactically motivated relations in discourse, basing our findings on the theoretical background of the Prague Dependency Treebank 2.0 and the Penn Discourse Treebank 2. Our aim is to revisit the present-day syntactico-semantic (tectogrammatical) annotation in the Prague Dependency Treebank, extend it for the purposes of a sentence-boundary-crossing representation and eventually to design a new, discourse level of annotation. In this paper, we propose a feasible process of such a transfer, comparing the possibilities the Praguian dependency-based approach offers with the Penn discourse annotation based primarily on the analysis and classification of discourse connectives.

2007

2006

pdf bib
ACL Lifetime Achievement Award: Old Linguists Never Die, They Only Get Obligatorily Deleted
Eva Hajičová
Computational Linguistics, Volume 32, Number 4, December 2006

pdf bib abs
Corpus Annotation as a Test of a Linguistic Theory
Eva Hajičová | Petr Sgall
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In the present contribution we claim that corpus annotation serves, among other things, as an invaluable test for linguistic theories standing behind the annotation schemes, and as such represents an irreplaceable resource of linguistic information for the build-up of grammars. To support this claim we present four linguistic phenomena for the study and relevant description of which in grammar a deep layer of corpus annotation as introduced in the Prague Dependency Treebank has brought important observations, namely the information structure of the sentence, condition of projectivity and word order, types of dependency relations and textual coreference.

2004

pdf bib abs
Condition of Projectivity in the Underlying Dependency Structures
Kateřina Veselá | Jiří Havelka | Eva Hajičová
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

The claim made in this paper is that in a formal description of language, it is possible and useful to work with dependency-based underlying representations of sentences (tectogrammatical representations) meeting the condition of projectivity. The reasons for the inclusion of this condition into the definition of the tectogrammatical representations are both formally and empirically sound (Section 1). An analysis of the material offered by the Prague Dependency Treebank with annotations of the underlying syntactic structure of sentences (described in Section 2) has led to an interesting classification of non-projective constructions in Czech (Section 3). It documents that most (types of) constructions that appear to be non-projective in the surface shape of sentences can be described by means of projective trees. The realization of the surface word order (with the use of movement rules) is then relegated to the morphemic level, where the representation of the sentence has the shape of a string rather than a tree.

pdf bib abs
Annotators’ Agreement: The Case of Topic-Focus Articulation
Kateřina Veselá | Jiří Havelka | Eva Hajičová
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

The annotation of the Prague Dependency Treebank (PDT) is conceived of as a multilayered scenario that comprises also dependency representations (tectogrammatical tree structures, TGTS's) of the underlying structure of the sentences. TGTS's capture three basic aspects of the underlying structure of sentences: (a) the dependency tree structure, (b) the kinds of dependency syntactic relations, and (c) the basic characteristics of the topic-focus articulation (TFA). Since the PDT is a large collection and the annotations on the deepest layer are to a large extent performed by several human annotators (based on an automatic preprocessing module), it is more than necessary to observe the consistence of annotators and the agreement among them. In the present paper, we summarize the results of the evaluation of parallel annotations of several samples taken from PDT and the measures accepted to improve the consistency of annotations.

pdf bib
Deep Syntactic Annotation: Tectogrammatical Representation and Beyond
Petr Sgall | Jarmila Panevová | Eva Hajičová
Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004

2002

pdf bib
Argument/Valency Structure in PropBank, LCS Database and Prague Dependency Treebank: A Comparative Pilot Study
Eva Hajičová | Ivona Kučerová
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
Topic-focus and Salience
Eva Hajičová | Petr Sgall
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf bib abs
Tagging of very large corpora: Topic-Focus Articulation
Eva Buráňová | Eva Hajičová | Petr Sgall
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

After a brief characterization of the theory of the topic-focus articulation of the sentence (TFA), rules are formulated that determine the assignment of appropriate values of the TFA attribute in the process of syntactico-semantic tagging of a very large corpus of Czech.

pdf bib abs
Deletions and their reconstruction in tectogrammatical syntactic tagging of very large corpora
Eva Hajičová | Markéta Ceplová
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

The procedure of reconstruction of the underlying structure of sentences (in the process of tagging a very large corpus of Czech) is described, with a special attention paid to the conditions under which the reconstruction of ellipted nodes is carried out.

pdf bib
Semantico-syntactic Tagging of Very Large Corpora: the Case of Restoration of Nodes on the Underlying Level
Eva Hajičová | Petr Sgall
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Coreference in Annotating a Large Corpus
Eva Hajičová | Jarmila Panevová | Petr Sgall
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1998

pdf bib
Movement rules revisited
Eva Hajičová
Processing of Dependency-Based Grammars

1995

pdf bib abs
An Automatic Procedure for Topic-Focus Identification
Eva Hajičová | Hana Skoumalová | Petr Sgall
Computational Linguistics, Volume 21, Number 1, March 1995

The dichotomy of topic and focus, based, in the Praguean Functional Generative Description, on the scale of communicative dynamism, is relevant not only for a possible placement of the sentence in a context, but also for its semantic interpretation. An automatic identification of topic and focus may use the input information on word order, on the systemic ordering of kinds of complementations (reflected by the underlying order of the items included in the focus), on definiteness, and on lexical semantic properties of words. An algorithm for the analysis of English sentences has been implemented and is discussed and illustrated on several examples.

1993

pdf bib abs
Identifying Topic and Focus by an Automatic Procedure
Eva Hajičová | Petr Sgall | Hana Skonmalovlá
Sixth Conference of the European Chapter of the Association for Computational Linguistics

An algorithm for automatic identification of topic and focus of the sentence is presented, based on dependency syntax and using written input, which is much more ambiguous than spoken utterance.

1992

pdf bib abs
Stock of Shared Knowledge - A Tool for Solving Pronominal Anaphora
Eva Hajičová | Vladislav Kuboň | Petr Kuboň
COLING 1992 Volume 1: The 14th International Conference on Computational Linguistics

The paper develops further the idea of using the notion of the stock of shared knowledge (SSK) for anaphora resolution following a more subtle treatment of the influence of the topic/focus articulation of the sentence on the degrees of salience of items of the SSK. An algorithmic evaluation procedure of the SSK is formulated taking into account the notions of contextual boundness, syntactic associations, complexity of the sentences and existence/nonexistence of possible competitors, and a general evaluating function is proposed, essential for the process of anaphora resolution. In the present paper the analysis is performed for Czech; however, the considerations are claimed to be of a universal validity, the actual relations between different factors and the values, of course, being language-dependent.

pdf bib abs
Derivation of Underlying Valency Frames From a Learner’s Dictionary
Alexandr Rosen | Eva Hajičová | Jan Hajič
COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics

The authors collect lexical data for a module of English syntactic analysis in the context of a bilingual research project. The computer usable version of OALD (Hornby, 1974) is used as the primary source. The main focus is on the structure and derivation of valency frames for verbal entries in the target lexicon. Illustration of the complex relation between OALD's verb subcategorization codes and the target complementation paradigms is provided, and an approach to the derivation procedure design suggested.

1991

February 13-25, 1991

1990

pdf bib abs
Hierarchy of Salience and Discourse Analysis and Production
Eva Hajičová | Petr Kuboň | Vladislav Kuboň
COLING 1990 Volume 3: Papers presented to the 13th International Conference on Computational Linguistics

The hierarchy of salience of the items of the knowledge assumed by the speaker to be shared by him and by the hearer constitutes one aspect of a dynamic account of discourse (Sect. 1). It is claimed that a representation of this hierarchy is a good support for discourse analysis (reference assignement , Sect. 2) and for discourse production (pronominalization, definite description, Sect. 3).

1989

pdf bib
A Dependency-Based Parser for Topic and Focus
Eva Hajičová
Proceedings of the First International Workshop on Parsing Technologies

1988

pdf bib
Reasons Why We Use Dependency Grammar
Eva Hajičová
Coling Budapest 1988 Volume 2: International Conference on Computational Linguistics

1987

pdf bib abs
Fail-Soft (“Emergency”) Measures in a Production-Oriented MT System
Eva Hajičová | Zdeněk Kirschner
Third Conference of the European Chapter of the Association for Computational Linguistics

A system of fail-soft (emergency) measures for a production-oriented MT system is discussed, stating first the specific purposes of such a system, and showing then, how these measures are being used in the system of English-to-Czech machine translation as prepared by the group of mathematical linguistics at Charles University in Prague.

1986

pdf bib
Degrees of Understanding
Eva Hajičová | Petr Sgall
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics

1985

pdf bib
Towards an Automatic Identification of Topic and Focus
Eva Hajičová | Petr Sgall
Second Conference of the European Chapter of the Association for Computational Linguistics

1984

pdf bib
Inferencing on Linguistically Based Semantic Structures
Eva Hajičová | Milena Hnátková
10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics

1983

pdf bib abs
Structure of Sentence and Inferencing in Question Answering
Eva Hajičová | Petr Sgall
First Conference of the European Chapter of the Association for Computational Linguistics

In the present paper we characterize in more detail some of the aspects of a question answering system using as its starting point the underlying structure of sentences (which with some approaches can be identified with the level of meaning or of logical form). First of all, the criteria are described that are used to identify the elementary units of underlying structure and the operations conjoining them into complex units (Sect. 1), then the main types of units and operations resulting from an empirical investigation on the basis of the criteria are registered (Sect. 2), and finally the rules of inference , accounting for the relevant aspects of the relationship between linguistic and cognitive structures are illustrated (Sec. 3).

1982

pdf bib abs
The Role of the Hierarchy of Activation in the Process of Natural Language Understanding
Eva Hajičová | Jarka Vrbová
Coling 1982: Proceedings of the Ninth International Conference on Computational Linguistics

The elements of the stock of knowledge shared by the speaker and the hearer change their salience, in the sense of being immediately accessible in the hearer's memory. The hierarchy of salience is argued to be a basic component of a mechanism serving for the identification of reference. Some of the regularities of this mechanism are discussed, the description of which is a necessary prerequisite of an automatic understanding of connected texts.

1980

pdf bib abs
Linguistic Meaning and Knowledge Representation in Automatic Understanding of Natural Language
Eva Hajičová | Petr Sgall
COLING 1980 Volume 1: The 8th International Conference on Computational Linguistics

The necessity of and means for distinguishing between a level of linguistic meaning and a domain of "factual knowledge" (or cognitive content) are argued for, supported by a survey of relevant operational criteria. The level of meaning is characterized as a safe base for computational applications, which allows for a set of inference rules accounting for the content (factual relations) of a given domain.