Péter Halácsy

Also published as: Péter Halácsky


2008

pdf
Parallel Creation of Gigaword Corpora for Medium Density Languages - an Interim Report
Péter Halácsy | András Kornai | Péter Németh | Dániel Varga
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

For increased speed in developing gigaword language resources for medium resource density languages we integrated several FOSS tools in the HUN* toolkit. While the speed and efficiency of the resulting pipeline has surpassed our expectations, our experience in developing LDC-style resource packages for Uzbek and Kurdish makes clear that neither the data collection nor the subsequent processing stages can be fully automated.

2007

pdf
Poster paper: HunPos – an open source trigram tagger
Péter Halácsy | András Kornai | Csaba Oravecz
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

pdf
Using a morphological analyzer in high precision POS tagging of Hungarian
Péter Halácsy | András Kornai | Csaba Oravecz | Viktor Trón | Dániel Varga
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The paper presents an evaluation of maxent POS disambiguation systems that incorporate an open source morphological analyzer to constrain the probabilistic models. The experiments show that the best proposed architecture, which is the first application of the maximum entropy framework in a Hungarian NLP task, outperforms comparable state of the art tagging methods and is able to handle out of vocabulary items robustly, allowing for efficient analysis of large (web-based) corpora.

pdf
Morphdb.hu: Hungarian lexical database and morphological grammar
Viktor Trón | Péter Halácsy | Péter Rebrus | András Rung | Péter Vajda | Eszter Simon
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes morphdb.hu, a Hungarian lexical database and morphological grammar. Morphdb.hu is the outcome of a several-year collaborative effort and represents the resource with the widest coverage and broadest range of applicability presently available for Hungarian. The grammar resource is the formalization of well-founded theoretical decisions handling inflection and productive derivation. The lexical database was created by merging three independent lexical databases, and the resulting resource was further extended.

pdf bib
Web-based frequency dictionaries for medium density languages
András Kornai | Péter Halácsy | Viktor Nagy | Csaba Oravecz | Viktor Trón | Dániel Varga
Proceedings of the 2nd International Workshop on Web as Corpus

2005

pdf
Hunmorph: Open Source Word Analysis
Viktor Trón | Gyögy Gyepesi | Péter Halácsky | András Kornai | László Németh | Dániel Varga
Proceedings of Workshop on Software

2004

pdf
Creating Open Language Resources for Hungarian
Péter Halácsy | András Kornai | László Németh | András Rung | István Szakadát | Viktor Trón
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)