CzeDLex is an electronic lexicon of Czech discourse connectives with its data coming from a large treebank annotated with discourse relations. Its new version CzeDLex 0.6 (as compared with the previous version 0.5, which was published in 2017) is significantly larger with respect to manually processed entries. Also, its structure has been modified to allow for primary connectives to appear with multiple entries for a single discourse sense. The lexicon comes in several formats, being both human and machine readable, and is available for searching in PML Tree Query, a user-friendly and powerful search tool for all kinds of linguistically annotated treebanks. The main purpose of this paper/demo is to present the new version of the lexicon and to demonstrate possibilities of mining various types of information from the lexicon using PML Tree Query; we present several examples of search queries over the lexicon data along with their results. The new version of the lexicon, CzeDLex~0.6, is available on-line and was officially released in December 2019 under the Creative Commons License.
Extracting a Lexicon of Discourse Connectives in Czech from an Annotated Corpus
Pavlína Synková | Magdaléna Rysová | Lucie Poláková | Jiří Mírovský
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation