Péter Halácsy

Also published as: Péter Halácsky

2008

Google for the Linguist on a Budget
András Kornai | Péter Halácsy
Proceedings of the 4th Web as Corpus Workshop

In this paper, we present GLB, yet another open source, free system to create, exploit linguistic corpora gathered from the web. A simple, robust web crawl algorithm, a multi-dimensional information retrieval tool„ a crude parallelization mechanism are proposed, especially for researchers working in resource-limited environments.

pdf bib abs

Parallel Creation of Gigaword Corpora for Medium Density Languages - an Interim Report
Péter Halácsy | András Kornai | Péter Németh | Dániel Varga
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

For increased speed in developing gigaword language resources for medium resource density languages we integrated several FOSS tools in the HUN* toolkit. While the speed and efficiency of the resulting pipeline has surpassed our expectations, our experience in developing LDC-style resource packages for Uzbek and Kurdish makes clear that neither the data collection nor the subsequent processing stages can be fully automated.

2007

pdf bib

Poster paper: HunPos – an open source trigram tagger
Péter Halácsy | András Kornai | Csaba Oravecz
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

pdf bib abs

Using a morphological analyzer in high precision POS tagging of Hungarian
Péter Halácsy | András Kornai | Csaba Oravecz | Viktor Trón | Dániel Varga
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The paper presents an evaluation of maxent POS disambiguation systems that incorporate an open source morphological analyzer to constrain the probabilistic models. The experiments show that the best proposed architecture, which is the first application of the maximum entropy framework in a Hungarian NLP task, outperforms comparable state of the art tagging methods and is able to handle out of vocabulary items robustly, allowing for efficient analysis of large (web-based) corpora.

pdf bib abs

This paper describes morphdb.hu, a Hungarian lexical database and morphological grammar. Morphdb.hu is the outcome of a several-year collaborative effort and represents the resource with the widest coverage and broadest range of applicability presently available for Hungarian. The grammar resource is the formalization of well-founded theoretical decisions handling inflection and productive derivation. The lexical database was created by merging three independent lexical databases, and the resulting resource was further extended.

pdf bib

Péter Halácsy

2008

2007

2006

2005

2004

Co-authors

Venues