Noémi Vadász


2024

pdf
HuLU: Hungarian Language Understanding Benchmark Kit
Noémi Ligeti-Nagy | Gergő Ferenczi | Enikő Héja | László János Laki | Noémi Vadász | Zijian Győző Yang | Tamás Váradi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The paper introduces the Hungarian Language Understanding (HuLU) benchmark, a comprehensive assessment framework designed to evaluate the performance of neural language models on Hungarian language tasks. Inspired by the renowned GLUE and SuperGLUE benchmarks, HuLU aims to address the challenges specific to Hungarian language processing. The benchmark consists of various datasets, each representing different linguistic phenomena and task complexities. Moreover, the paper presents a web service developed for HuLU, offering a user-friendly interface for model evaluation. This platform not only ensures consistent assessment but also fosters transparency by maintaining a leaderboard showcasing model performances. Preliminary evaluations of various LMMs on HuLU datasets indicate that while Hungarian models show promise, there’s room for improvement to match the proficiency of English-centric models in their native language.

2022

pdf
Building a Manually Annotated Hungarian Coreference Corpus: Workflow and Tools
Noémi Vadász
Proceedings of the Fifth Workshop on Computational Models of Reference, Anaphora and Coreference

This paper presents the complete workflow of building a manually annotated Hungarian corpus, KorKor, with particular reference to anaphora and coreference annotation. All linguistic annotation layers were corrected manually. The corpus is freely available in two formats. The paper gives insight into the process of setting up the workflow and the challenges that have arisen.

2019

pdf
What does the Nom say? An algorithm for case disambiguation in Hungarian
Noémi Ligeti-Nagy | Andrea Dömötör | Noémi Vadász
Proceedings of the Fifth International Workshop on Computational Linguistics for Uralic Languages

pdf
One format to rule them all – The emtsv pipeline for Hungarian
Balázs Indig | Bálint Sass | Eszter Simon | Iván Mittelholcz | Noémi Vadász | Márton Makrai
Proceedings of the 13th Linguistic Annotation Workshop

We present a more efficient version of the e-magyar NLP pipeline for Hungarian called emtsv. It integrates Hungarian NLP tools in a framework whose individual modules can be developed or replaced independently and allows new ones to be added. The design also allows convenient investigation and manual correction of the data flow from one module to another. The improvements we publish include effective communication between the modules and support of the use of individual modules both in the chain and standing alone. Our goals are accomplished using extended tsv (tab separated values) files, a simple, uniform, generic and self-documenting input/output format. Our vision is maintaining the system for a long time and making it easier for external developers to fit their own modules into the system, thus sharing existing competencies in the field of processing Hungarian, a mid-resourced language. The source code is available under LGPL 3.0 license at https://github.com/dlt-rilmta/emtsv .