Karel Pala


2012

pdf
Legal electronic dictionary for Czech
František Cvrček | Karel Pala | Pavel Rychlý
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In the paper the results of the project of Czech Legal Electronic dictionary (PES) are presented. During the 4 year project the large legal terminological dictionary of Czech was created in the form of the electronic lexical database enriched with a hierarchical ontology of legal terms. It contains approx. 10,000 entries ― legal terms together with their ontological relations and hypertext references. In the second part of the project the web interface based on the platform DEBII has been designed and implemented that allows users to browse and search effectively the database. At the same time the Czech Dictionary of Legal Terms will be generated from the database and later printed as a book. Inter-annotator's agreement in manual selection of legal terms was high ― approx. 95 %.

2010

pdf
Using Ontologies for Semi-automatic Linking VerbaLex with FrameNet
Jiří Materna | Karel Pala
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This work presents a method of linking verbs and their valency frames in VerbaLex database developed at the Centre for NLP at the Faculty of Informatics Masaryk University to the frames in Berkeley FrameNet. While completely manual work may take a long time, the proposed semi-automatic approach requires a smaller amount of human effort to reach sufficient results. The method of linking VerbaLex frames to FrameNet frames consists of two phases. The goal of the first one is to find an appropriate FrameNet frame for each frame in VerbaLex. The second phase includes assigning FrameNet frame elements to the deep semantic roles in VerbaLex. In this work main emphasis is put on the exploitation of ontologies behind VerbaLex and FrameNet. Especially, the method of linking FrameNet frame elements with VerbaLex semantic roles is built using the information provided by the ontology of semantic types in FrameNet. Based on the proposed technique, a semi-automatic linking tool has been developed. By linking FrameNet to VerbaLex, we are able to find a non-trivial subset of the interlingual FrameNet frames (including their frame-to-frame relations), which could be used as a core for building FrameNet in Czech.

pdf
Lexical Resources for Noun Compounds in Czech, English and Zulu
Karel Pala | Christiane Fellbaum | Sonja Bosch
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we discuss noun compounding, a highly generative, productive process, in three distinct languages: Czech, English and Zulu. Derivational morphology presents a large grey area between regular, compositional and idiosyncratic, non-compositional word forms. The structural properties of compounds in each of the languages are reviewed and contrasted. Whereas English compounds are head-final and thus left-branching, Czech and Zulu compounds usually consist of a leftmost governing head and a rightmost dependent element. Semantic properties of compounds are discussed with special reference to semantic relations between compound members which cross-linguistically show universal patterns, but idiosyncratic, language specific compounds are also identified. The integration of compounds into lexical resources, and WordNets in particular, remains a challenge that needs to be considered in terms of the compounds’ syntactic idiosyncrasy and semantic compositionality. Experiments with processing compounds in Czech, English and Zulu are reported and partly evaluated. The obtained partial lists of the Czech, English and Zulu compounds are also described.

2008

pdf
Czech MWE Database
Karel Pala | Lukáš Svoboda | Pavel Šmerk
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we deal with a recently developed large Czech MWE database containing at the moment 160,000 MWEs (treated as lexical units). It was compiled from various resources such as encyclopedias and dictionaries, public databases of proper names and toponyms, collocations obtained from Czech WordNet, lists of botanical and zoological terms and others. We describe the structure of the database and compare the built MWEs database with the corpus data from Czech National Corpus SYN2000 (approx. 100 mil. tokens) and present results of this comparison in the paper. These MWEs have not been obtained from the corpus since their frequencies in it are rather low. To obtain a more complete list of MWEs we propose and use a technique exploiting the Word Sketch Engine, which allows us to work with statistical parameters such as frequency of MWEs and their components as well as with the salience for the whole MWEs. We also discuss exploitation of the database for working out a more adequate tagging and lemmatization. The final goal is to be able to recognize MWEs in corpus text and lemmatize them as complete lexical units, i.e. to make tagging and lemmatization more adequate.

2007

pdf
Verb Valency Semantic Representation for Deep Linguistic Processing
Aleš Horák | Karel Pala | Marie Duží | Pavel Materna
ACL 2007 Workshop on Deep Linguistic Processing

pdf
Derivational Relations in Czech WordNet
Karel Pala | Dana Hlaváčková
Proceedings of the Workshop on Balto-Slavonic Natural Language Processing

2004

pdf
Top Ontology as a Tool for Semantic Role Tagging
Karel Pala | Pavel Smrz
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
Relations between Inflectional and Derivation Patterns
Karel Pala | Radek Sedláček | Marek Veber
Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages

2002

pdf
Databases of Heterogeneous Segments for Concatenative Speech Synthesis
Ivan Kopeček | Karel Pala
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf
A Procedure for Word Derivational Processes Concerning Lexicon Extension in Highly Inflected Languages
Klára Osolsobĕ | Karel Pala | Radek Sedláček | Marek Veber
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf
Application of WordNet ILR in Czech Word-formation
Jana Klímová | Karel Pala
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)