Eszter Simon


2019

pdf
One format to rule them all – The emtsv pipeline for Hungarian
Balázs Indig | Bálint Sass | Eszter Simon | Iván Mittelholcz | Noémi Vadász | Márton Makrai
Proceedings of the 13th Linguistic Annotation Workshop

We present a more efficient version of the e-magyar NLP pipeline for Hungarian called emtsv. It integrates Hungarian NLP tools in a framework whose individual modules can be developed or replaced independently and allows new ones to be added. The design also allows convenient investigation and manual correction of the data flow from one module to another. The improvements we publish include effective communication between the modules and support of the use of individual modules both in the chain and standing alone. Our goals are accomplished using extended tsv (tab separated values) files, a simple, uniform, generic and self-documenting input/output format. Our vision is maintaining the system for a long time and making it easier for external developers to fit their own modules into the system, thus sharing existing competencies in the field of processing Hungarian, a mid-resourced language. The source code is available under LGPL 3.0 license at https://github.com/dlt-rilmta/emtsv .

2018

pdf
Automatic Generation of Wiktionary Entries for Finno-Ugric Minority Languages
Zsanett Ferenczi | Iván Mittelholcz | Eszter Simon
Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages

pdf
E-magyar – A Digital Language Processing System
Tamás Váradi | Eszter Simon | Bálint Sass | Iván Mittelholcz | Attila Novák | Balázs Indig | Richárd Farkas | Veronika Vincze
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Evaluation of Dictionary Creating Methods for Finno-Ugric Minority Languages
Zsanett Ferenczi | Iván Mittelholcz | Eszter Simon | Tamás Váradi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf
Languages under the influence: Building a database of Uralic languages
Eszter Simon | Nikolett Mus
Proceedings of the Third Workshop on Computational Linguistics for Uralic Languages

2016

pdf
Universal Morphology for Old Hungarian
Eszter Simon | Veronika Vincze
Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

2014

pdf
Media monitoring and information extraction for the highly inflected agglutinative language Hungarian
Júlia Pajzs | Ralf Steinberger | Maud Ehrmann | Mohamed Ebrahim | Leonida Della Rocca | Stefano Bucci | Eszter Simon | Tamás Váradi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The Europe Media Monitor (EMM) is a fully-automatic system that analyses written online news by gathering articles in over 70 languages and by applying text analysis software for currently 21 languages, without using linguistic tools such as parsers, part-of-speech taggers or morphological analysers. In this paper, we describe the effort of adding to EMM Hungarian text mining tools for news gathering; document categorisation; named entity recognition and classification for persons, organisations and locations; name lemmatisation; quotation recognition; and cross-lingual linking of related news clusters. The major challenge of dealing with the Hungarian language is its high degree of inflection and agglutination. We present several experiments where we apply linguistically light-weight methods to deal with inflection and we propose a method to overcome the challenges. We also present detailed frequency lists of Hungarian person and location name suffixes, as found in real-life news texts. This empirical data can be used to draw further conclusions and to improve existing Named Entity Recognition software. Within EMM, the solutions described here will also be applied to other morphologically complex languages such as those of the Slavic language family. The media monitoring and analysis system EMM is freely accessible online via the web page http://emm.newsbrief.eu/overview.html.

2012

pdf
Automatically generated NE tagged corpora for English and Hungarian
Eszter Simon | Dávid Márk Nemeskey
Proceedings of the 4th Named Entity Workshop (NEWS) 2012

2007

pdf
GYDER: Maxent Metonymy Resolution
Richárd Farkas | Eszter Simon | György Szarvas | Dániel Varga
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf
Morphdb.hu: Hungarian lexical database and morphological grammar
Viktor Trón | Péter Halácsy | Péter Rebrus | András Rung | Péter Vajda | Eszter Simon
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes morphdb.hu, a Hungarian lexical database and morphological grammar. Morphdb.hu is the outcome of a several-year collaborative effort and represents the resource with the widest coverage and broadest range of applicability presently available for Hungarian. The grammar resource is the formalization of well-founded theoretical decisions handling inflection and productive derivation. The lexical database was created by merging three independent lexical databases, and the resulting resource was further extended.