Iván Mittelholcz


2020

pdf
The xtsv Framework and the Twelve Virtues of Pipelines
Balázs Indig | Bálint Sass | Iván Mittelholcz
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present xtsv, an abstract framework for building NLP pipelines. It covers several kinds of functionalities which can be implemented at an abstract level. We survey these features and argue that all are desired in a modern pipeline. The framework has a simple yet powerful internal communication format which is essentially tsv (tab separated values) with header plus some additional features. We put emphasis on the capabilities of the presented framework, for example its ability to allow new modules to be easily integrated or replaced, or the variety of its usage options. When a module is put into xtsv, all functionalities of the system are immediately available for that module, and the module can be be a part of an xtsv pipeline. The design also allows convenient investigation and manual correction of the data flow from one module to another. We demonstrate the power of our framework with a successful application: a concrete NLP pipeline for Hungarian called e-magyar text processing system (emtsv) which integrates Hungarian NLP tools in xtsv. All the advantages of the pipeline come from the inherent properties of the xtsv framework.

2019

pdf
One format to rule them all – The emtsv pipeline for Hungarian
Balázs Indig | Bálint Sass | Eszter Simon | Iván Mittelholcz | Noémi Vadász | Márton Makrai
Proceedings of the 13th Linguistic Annotation Workshop

We present a more efficient version of the e-magyar NLP pipeline for Hungarian called emtsv. It integrates Hungarian NLP tools in a framework whose individual modules can be developed or replaced independently and allows new ones to be added. The design also allows convenient investigation and manual correction of the data flow from one module to another. The improvements we publish include effective communication between the modules and support of the use of individual modules both in the chain and standing alone. Our goals are accomplished using extended tsv (tab separated values) files, a simple, uniform, generic and self-documenting input/output format. Our vision is maintaining the system for a long time and making it easier for external developers to fit their own modules into the system, thus sharing existing competencies in the field of processing Hungarian, a mid-resourced language. The source code is available under LGPL 3.0 license at https://github.com/dlt-rilmta/emtsv .

2018

pdf
Automatic Generation of Wiktionary Entries for Finno-Ugric Minority Languages
Zsanett Ferenczi | Iván Mittelholcz | Eszter Simon
Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages

pdf
E-magyar – A Digital Language Processing System
Tamás Váradi | Eszter Simon | Bálint Sass | Iván Mittelholcz | Attila Novák | Balázs Indig | Richárd Farkas | Veronika Vincze
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Evaluation of Dictionary Creating Methods for Finno-Ugric Minority Languages
Zsanett Ferenczi | Iván Mittelholcz | Eszter Simon | Tamás Váradi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)