This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
KarelPala
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
In the paper the results of the project of Czech Legal Electronic dictionary (PES) are presented. During the 4 year project the large legal terminological dictionary of Czech was created in the form of the electronic lexical database enriched with a hierarchical ontology of legal terms. It contains approx. 10,000 entries ― legal terms together with their ontological relations and hypertext references. In the second part of the project the web interface based on the platform DEBII has been designed and implemented that allows users to browse and search effectively the database. At the same time the Czech Dictionary of Legal Terms will be generated from the database and later printed as a book. Inter-annotator's agreement in manual selection of legal terms was high ― approx. 95 %.
This work presents a method of linking verbs and their valency frames in VerbaLex database developed at the Centre for NLP at the Faculty of Informatics Masaryk University to the frames in Berkeley FrameNet. While completely manual work may take a long time, the proposed semi-automatic approach requires a smaller amount of human effort to reach sufficient results. The method of linking VerbaLex frames to FrameNet frames consists of two phases. The goal of the first one is to find an appropriate FrameNet frame for each frame in VerbaLex. The second phase includes assigning FrameNet frame elements to the deep semantic roles in VerbaLex. In this work main emphasis is put on the exploitation of ontologies behind VerbaLex and FrameNet. Especially, the method of linking FrameNet frame elements with VerbaLex semantic roles is built using the information provided by the ontology of semantic types in FrameNet. Based on the proposed technique, a semi-automatic linking tool has been developed. By linking FrameNet to VerbaLex, we are able to find a non-trivial subset of the interlingual FrameNet frames (including their frame-to-frame relations), which could be used as a core for building FrameNet in Czech.
In this paper we discuss noun compounding, a highly generative, productive process, in three distinct languages: Czech, English and Zulu. Derivational morphology presents a large grey area between regular, compositional and idiosyncratic, non-compositional word forms. The structural properties of compounds in each of the languages are reviewed and contrasted. Whereas English compounds are head-final and thus left-branching, Czech and Zulu compounds usually consist of a leftmost governing head and a rightmost dependent element. Semantic properties of compounds are discussed with special reference to semantic relations between compound members which cross-linguistically show universal patterns, but idiosyncratic, language specific compounds are also identified. The integration of compounds into lexical resources, and WordNets in particular, remains a challenge that needs to be considered in terms of the compounds syntactic idiosyncrasy and semantic compositionality. Experiments with processing compounds in Czech, English and Zulu are reported and partly evaluated. The obtained partial lists of the Czech, English and Zulu compounds are also described.
In this paper we deal with a recently developed large Czech MWE database containing at the moment 160,000 MWEs (treated as lexical units). It was compiled from various resources such as encyclopedias and dictionaries, public databases of proper names and toponyms, collocations obtained from Czech WordNet, lists of botanical and zoological terms and others. We describe the structure of the database and compare the built MWEs database with the corpus data from Czech National Corpus SYN2000 (approx. 100 mil. tokens) and present results of this comparison in the paper. These MWEs have not been obtained from the corpus since their frequencies in it are rather low. To obtain a more complete list of MWEs we propose and use a technique exploiting the Word Sketch Engine, which allows us to work with statistical parameters such as frequency of MWEs and their components as well as with the salience for the whole MWEs. We also discuss exploitation of the database for working out a more adequate tagging and lemmatization. The final goal is to be able to recognize MWEs in corpus text and lemmatize them as complete lexical units, i.e. to make tagging and lemmatization more adequate.