2019
pdf
bib
Elliptical Constructions in Estonian UD Treebank
Kadri Muischnek
|
Liisi Torga
Proceedings of the Fifth International Workshop on Computational Linguistics for Uralic Languages
2017
pdf
bib
Estonian Copular and Existential Constructions as an UD Annotation Problem
Kadri Muischnek
|
Kaili Müürisep
Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017)
2016
pdf
bib
abs
Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies
Kadri Muischnek
|
Kaili Müürisep
|
Tiina Puolakainen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper presents the first version of Estonian Universal Dependencies Treebank which has been semi-automatically acquired from Estonian Dependency Treebank and comprises ca 400,000 words (ca 30,000 sentences) representing the genres of fiction, newspapers and scientific writing. Article analyses the differences between two annotation schemes and the conversion procedure to Universal Dependencies format. The conversion has been conducted by manually created Constraint Grammar transfer rules. As the rules enable to consider unbounded context, include lexical information and both flat and tree structure features at the same time, the method has proved to be reliable and flexible enough to handle most of transformations. The automatic conversion procedure achieved LAS 95.2%, UAS 96.3% and LA 98.4%. If punctuation marks were excluded from the calculations, we observed LAS 96.4%, UAS 97.7% and LA 98.2%. Still the refinement of the guidelines and methodology is needed in order to re-annotate some syntactic phenomena, e.g. inter-clausal relations. Although automatic rules usually make quite a good guess even in obscure conditions, some relations should be checked and annotated manually after the main conversion.
2012
pdf
bib
abs
Robust clause boundary identification for corpus annotation
Heiki-Jaan Kaalep
|
Kadri Muischnek
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The paper describes a rule-based system for tagging clause boundaries, implemented for annotating the Estonian Reference Corpus of the University of Tartu, a collection of written texts containing ca 245 million running words and available for querying via Keeleveeb language portal. The system needs information about parts of speech and grammatical categories coded in the word-forms, i.e. it takes morphologically annotated text as input, but requires no information about the syntactic structure of the sentence. Among the strong points of our system we should mention identifying parenthesis and embedded clauses, i.e. clauses that are inserted into another clause dividing it into two separate parts in the linear text, for example a relative clause following its head noun. That enables a corpus query system to unite the otherwise divided clause, a feature that usually presupposes full parsing. The overall precision of the system is 95% and the recall is 96%. If ordinary clause boundary detection and parenthesis and embedded clause boundary detection are evaluated separately, then one can say that detecting an ordinary clause boundary (recall 98%, precision 96%) is an easier task than detecting an embedded clause (recall 79%, precision 100%).
2011
pdf
bib
Morphological analysis of a non-standard language variety
Heiki-Jaan Kaalep
|
Kadri Muischnek
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)
2007
pdf
bib
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)
Joakim Nivre
|
Heiki-Jaan Kaalep
|
Kadri Muischnek
|
Mare Koit
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)
pdf
bib
Estonian-English Statistical Machine Translation: the First Results
Mark Fishel
|
Heiki-Jaan Kaalep
|
Kadri Muischnek
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)
2006
pdf
bib
Multi-word verbs in a flective language: the case of Estonian
Heiki-Jaan Kaalep
|
Kadri Muischnek
Proceedings of the Workshop on Multi-word-expressions in a multilingual context
2002
pdf
bib
Using the Text Corpus to Create a Comprehensive List of Phrasal Verbs
Heiki-Jaan Kaalep
|
Kadri Muischnek
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)