Noémi Ligeti-Nagy


2025

pdf bib
OpenHuEval: Evaluating Large Language Model on Hungarian Specifics
Haote Yang | Xingjian Wei | Jiang Wu | Noémi Ligeti-Nagy | Jiaxing Sun | Yinfan Wang | Győző Zijian Yang | Junyuan Gao | Jingchao Wang | Bowen Jiang | Shasha Wang | Nanjun Yu | Zihao Zhang | Shixin Hong | Hongwei Liu | Wei Li | Songyang Zhang | Dahua Lin | Lijun Wu | Gábor Prószéky | Conghui He
Findings of the Association for Computational Linguistics: ACL 2025

We introduce OpenHuEval, the first benchmark for LLMs focusing on the Hungarian language and specifics. OpenHuEval is constructed from a vast collection of Hungarian-specific materials sourced from multiple origins. In the construction, we incorporated the latest design principles for evaluating LLMs, such as using real user queries from the internet, emphasizing the assessment of LLMs’ generative capabilities, and employing LLM-as-judge to enhance the multidimensionality and accuracy of evaluations. Ultimately, OpenHuEval encompasses eight Hungarian-specific dimensions, featuring five tasks and 3953 questions. Consequently, OpenHuEval provides the comprehensive, in-depth, and scientifically accurate assessment of LLM performance in the context of the Hungarian language and its specifics. We evaluated current mainstream LLMs, including both traditional LLMs and recently developed Large Reasoning Models. The results demonstrate the significant necessity for evaluation and model optimization tailored to the Hungarian language and specifics. We also established the framework for analyzing the thinking processes of LRMs with OpenHuEval, revealing intrinsic patterns and mechanisms of these models in non-English languages, with Hungarian serving as a representative example. We will release OpenHuEval at https://github.com/opendatalab/OpenHuEval .

2024

pdf bib
HuLU: Hungarian Language Understanding Benchmark Kit
Noémi Ligeti-Nagy | Gergő Ferenczi | Enikő Héja | László János Laki | Noémi Vadász | Zijian Győző Yang | Tamás Váradi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The paper introduces the Hungarian Language Understanding (HuLU) benchmark, a comprehensive assessment framework designed to evaluate the performance of neural language models on Hungarian language tasks. Inspired by the renowned GLUE and SuperGLUE benchmarks, HuLU aims to address the challenges specific to Hungarian language processing. The benchmark consists of various datasets, each representing different linguistic phenomena and task complexities. Moreover, the paper presents a web service developed for HuLU, offering a user-friendly interface for model evaluation. This platform not only ensures consistent assessment but also fosters transparency by maintaining a leaderboard showcasing model performances. Preliminary evaluations of various LMMs on HuLU datasets indicate that while Hungarian models show promise, there’s room for improvement to match the proficiency of English-centric models in their native language.

2022

pdf bib
A Clique-based Graphical Approach to Detect Interpretable Adjectival Senses in Hungarian
Enikő Héja | Noémi Ligeti-Nagy
Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing

The present paper introduces an ongoing research which aims to detect interpretable adjectival senses from monolingual corpora applying an unsupervised WSI approach. According to our expectations the findings of our investigation are going to contribute to the work of lexicographers, linguists and also facilitate the creation of benchmarks with semantic information for the NLP community. For doing so, we set up four criteria to distinguish between senses. We experiment with a graphical approach to model our criteria and then perform a detailed, linguistically motivated manual evaluation of the results.

2019

pdf bib
What does the Nom say? An algorithm for case disambiguation in Hungarian
Noémi Ligeti-Nagy | Andrea Dömötör | Noémi Vadász
Proceedings of the Fifth International Workshop on Computational Linguistics for Uralic Languages

pdf bib
Creation of a corpus with semantic role labels for Hungarian
Attila Novák | László Laki | Borbála Novák | Andrea Dömötör | Noémi Ligeti-Nagy | Ágnes Kalivoda
Proceedings of the 13th Linguistic Annotation Workshop

In this article, an ongoing research is presented, the immediate goal of which is to create a corpus annotated with semantic role labels for Hungarian that can be used to train a parser-based system capable of formulating relevant questions about the text it processes. We briefly describe the objectives of our research, our efforts at eliminating errors in the Hungarian Universal Dependencies corpus, which we use as the base of our annotation effort, at creating a Hungarian verbal argument database annotated with thematic roles, at classifying adjuncts, and at matching verbal argument frames to specific occurrences of verbs and participles in the corpus.

2018

pdf bib
What’s Wrong, Python? – A Visual Differ and Graph Library for NLP in Python
Balázs Indig | András Simonyi | Noémi Ligeti-Nagy
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)