Dávid Márk Nemeskey
Also published as: David Mark Nemeskey
2025
Variety delights (sometimes) - Annotation differences in morphologically annotated corpora
Andrea Dömötör | Balázs Indig | Dávid Márk Nemeskey
Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)
Andrea Dömötör | Balázs Indig | Dávid Márk Nemeskey
Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)
The goal of annotation standards is to ensure consistency across different corpora and languages. But do they succeed? In our paper we experiment with morphologically annotated Hungarian corpora of different sizes (ELTE DH gold standard corpus, NYTK-NerKor, and Szeged Treebank) to assess their compatibility as a merged training corpus for morphological analysis and disambiguation. Our results show that combining any two corpora not only failed to improve the results of the trained tagger but even degraded them due the inconsistent annotations. Further analysis of the annotation differences among the corpora revealed inconsistencies of several sources: different theoretical approach, lack of consensus, and tagset conversion issues.
2023
huPWKP: A Hungarian Text Simplification Corpus
Noémi Prótár | Dávid Márk Nemeskey
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Noémi Prótár | Dávid Márk Nemeskey
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
In this article we introduce huPWKP, the first parallel corpus consisting of Hungarian standard language-simplified sentence pairs. As Hungarian is a quite low-resource language in regards to text simplification, we opted for translating an already existing corpus, PWKP (Zhu et al., 2010), on which we performed some cleaning in order to improve its quality. We evaluated the corpus both with the help of human evaluators and by training a seq2seq model on both the Hungarian corpus and the original (cleaned) English corpus. The Hungarian model performed slightly worse in terms of automatic metrics; however, the English model attains a SARI score close to the state of the art on the official PWKP set. According to the human evaluation, the corpus performs at around 3 on a scale ranging from 1 to 5 in terms of information retention and increase in simplification and around 3.7 in terms of grammaticality.
2016
Detecting Optional Arguments of Verbs
András Kornai | Dávid Márk Nemeskey | Gábor Recski
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
András Kornai | Dávid Márk Nemeskey | Gábor Recski
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
We propose a novel method for detecting optional arguments of Hungarian verbs using only positive data. We introduce a custom variant of collexeme analysis that explicitly models the noise in verb frames. Our method is, for the most part, unsupervised: we use the spectral clustering algorithm described in Brew and Schulte in Walde (2002) to build a noise model from a short, manually verified seed list of verbs. We experimented with both raw count- and context-based clusterings and found their performance almost identical. The code for our algorithm and the frame list are freely available at http://hlt.bme.hu/en/resources/tade.
Evaluating multi-sense embeddings for semantic resolution monolingually and in word translation
Gábor Borbély | Márton Makrai | Dávid Márk Nemeskey | András Kornai
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP
Gábor Borbély | Márton Makrai | Dávid Márk Nemeskey | András Kornai
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP
2015
Competence in lexical semantics
András Kornai | Judit Ács | Márton Makrai | Dávid Márk Nemeskey | Katalin Pajkossy | Gábor Recski
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics
András Kornai | Judit Ács | Márton Makrai | Dávid Márk Nemeskey | Katalin Pajkossy | Gábor Recski
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics
2014
Why Implementation Matters: Evaluation of an Open-source Constraint Grammar Parser
Dávid Márk Nemeskey | Francis Tyers | Mans Hulden
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
Dávid Márk Nemeskey | Francis Tyers | Mans Hulden
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
2013
Applicative structure in vector space models
Márton Makrai | David Mark Nemeskey | András Kornai
Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality
Márton Makrai | David Mark Nemeskey | András Kornai
Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality