Viktor Nagy


pdf bib
Web-based frequency dictionaries for medium density languages
András Kornai | Péter Halácsy | Viktor Nagy | Csaba Oravecz | Viktor Trón | Dániel Varga
Proceedings of the 2nd International Workshop on Web as Corpus


Combining Symbolic and Statistical Methods in Morphological Analysis and Unknown Word Guessing
Attila Novák | Viktor Nagy | Csaba Oravecz
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Highly inflectional/agglutinative languages like Hungarian typically feature possible word forms in such a magnitude that automatic methods that provide morphosyntactic annotation on the basis of some training corpus often face the problem of data sparseness. A possible solution to this problem is to apply a comprehensive morphological analyser, which is able to analyse almost all wordforms alleviating the problem of unseen tokens. However, although in a smaller number, there will still remain forms which are unknown even to the morphological analyzer and should be handled by some guesser mechanism. The paper will describe a hybrid method which combines symbolic and statistical information to provide lemmatization and suffix analyses for unknown word forms. Evaluation is carried out with respect to the induction of possible analyses and their respective lexical probabilities for unknown word forms in a part-of-speech tagging system.