Richárd Farkas

Also published as: Richard Farkas


2023

pdf
A Question Answering Benchmark Database for Hungarian
Attila Novák | Borbála Novák | Tamás Zombori | Gergő Szabó | Zsolt Szántó | Richárd Farkas
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)

Within the research presented in this article, we created a new question answering benchmark database for Hungarian called MILQA. When creating the dataset, we basically followed the principles of the English SQuAD 2.0, however, like in some more recent English question answering datasets, we introduced a number of innovations beyond SQuAD: e.g., yes/no-questions, list-like answers consisting of several text spans, long answers, questions requiring calculation and other question types where you cannot simply copy the answer from the text. For all these non-extractive question types, the pragmatically adequate form of the answer was also added to make the training of generative models possible. We implemented and evaluated a set of baseline retrieval and answer span extraction models on the dataset. BM25 performed better than any vector-based solution for retrieval. Cross-lingual transfer from English significantly improved span extraction models.

2018

pdf
SzegedKoref: A Hungarian Coreference Corpus
Veronika Vincze | Klára Hegedűs | Alex Sliz-Nagy | Richárd Farkas
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
E-magyar – A Digital Language Processing System
Tamás Váradi | Eszter Simon | Bálint Sass | Iván Mittelholcz | Attila Novák | Balázs Indig | Richárd Farkas | Veronika Vincze
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf
Universal Dependencies and Morphology for Hungarian - and on the Price of Universality
Veronika Vincze | Katalin Simkó | Zsolt Szántó | Richárd Farkas
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

In this paper, we present how the principles of universal dependencies and morphology have been adapted to Hungarian. We report the most challenging grammatical phenomena and our solutions to those. On the basis of the adapted guidelines, we have converted and manually corrected 1,800 sentences from the Szeged Treebank to universal dependency format. We also introduce experiments on this manually annotated corpus for evaluating automatic conversion and the added value of language-specific, i.e. non-universal, annotations. Our results reveal that converting to universal dependencies is not necessarily trivial, moreover, using language-specific morphological features may have an impact on overall performance.

2014

pdf
Special Techniques for Constituent Parsing of Morphologically Rich Languages
Zsolt Szántó | Richárd Farkas
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Dependency parsing with latent refinements of part-of-speech tags
Thomas Mueller | Richard Farkas | Alex Judea | Helmut Schmid | Hinrich Schuetze
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
Introducing the IMS-Wrocław-Szeged-CIS entry at the SPMRL 2014 Shared Task: Reranking and Morpho-syntax meet Unlabeled Data
Anders Björkelund | Özlem Çetinoğlu | Agnieszka Faleńska | Richárd Farkas | Thomas Mueller | Wolfgang Seeker | Zsolt Szántó
Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages

pdf
Szeged Corpus 2.5: Morphological Modifications in a Manually POS-tagged Hungarian Corpus
Veronika Vincze | Viktor Varga | Katalin Ilona Simkó | János Zsibrita | Ágoston Nagy | Richárd Farkas | János Csirik
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The Szeged Corpus is the largest manually annotated database containing the possible morphological analyses and lemmas for each word form. In this work, we present its latest version, Szeged Corpus 2.5, in which the new harmonized morphological coding system of Hungarian has been employed and, on the other hand, the majority of misspelled words have been corrected and tagged with the proper morphological code. New morphological codes are introduced for participles, causative / modal / frequentative verbs, adverbial pronouns and punctuation marks, moreover, the distinction between common and proper nouns is eliminated. We also report some statistical data on the frequency of the new morphological codes. The new version of the corpus made it possible to train magyarlanc, a data-driven POS-tagger of Hungarian on a dataset with the new harmonized codes. According to the results, magyarlanc is able to achieve a state-of-the-art accuracy score on the 2.5 version as well.

pdf
SZTE-NLP: Aspect level opinion mining exploiting syntactic cues
Viktor Hangya | Gábor Berend | István Varga | Richárd Farkas
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf
SZTE-NLP: Clinical Text Analysis with Named Entity Recognition
Melinda Katona | Richárd Farkas
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf
An Empirical Evaluation of Automatic Conversion from Constituency to Dependency in Hungarian
Katalin Ilona Simkó | Veronika Vincze | Zsolt Szántó | Richárd Farkas
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf
magyarlanc: A Tool for Morphological and Dependency Parsing of Hungarian
János Zsibrita | Veronika Vincze | Richárd Farkas
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf
Joint Morphological and Syntactic Analysis for Richly Inflected Languages
Bernd Bohnet | Joakim Nivre | Igor Boguslavsky | Richárd Farkas | Filip Ginter | Jan Hajič
Transactions of the Association for Computational Linguistics, Volume 1

Joint morphological and syntactic analysis has been proposed as a way of improving parsing accuracy for richly inflected languages. Starting from a transition-based model for joint part-of-speech tagging and dependency parsing, we explore different ways of integrating morphological features into the model. We also investigate the use of rule-based morphological analyzers to provide hard or soft lexical constraints and the use of word clusters to tackle the sparsity of lexical features. Evaluation on five morphologically rich languages (Czech, Finnish, German, Hungarian, and Russian) shows consistent improvements in both morphological and syntactic accuracy for joint prediction over a pipeline model, with further improvements thanks to lexical constraints and word clusters. The final results improve the state of the art in dependency parsing for all languages.

pdf
Munich-Edinburgh-Stuttgart Submissions of OSM Systems at WMT13
Nadir Durrani | Alexander Fraser | Helmut Schmid | Hassan Sajjad | Richárd Farkas
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf
Munich-Edinburgh-Stuttgart Submissions at WMT13: Morphological and Syntactic Processing for SMT
Marion Weller | Max Kisselew | Svetlana Smekalova | Alexander Fraser | Helmut Schmid | Nadir Durrani | Hassan Sajjad | Richárd Farkas
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf
LFG-based Features for Noun Number and Article Grammatical Errors
Gábor Berend | Veronika Vincze | Sina Zarrieß | Richárd Farkas
Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task

pdf
(Re)ranking Meets Morphosyntax: State-of-the-art Results from the SPMRL 2013 Shared Task
Anders Björkelund | Özlem Çetinoğlu | Richárd Farkas | Thomas Mueller | Wolfgang Seeker
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf
Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages
Djamé Seddah | Reut Tsarfaty | Sandra Kübler | Marie Candito | Jinho D. Choi | Richárd Farkas | Jennifer Foster | Iakes Goenaga | Koldo Gojenola Galletebeitia | Yoav Goldberg | Spence Green | Nizar Habash | Marco Kuhlmann | Wolfgang Maier | Joakim Nivre | Adam Przepiórkowski | Ryan Roth | Wolfgang Seeker | Yannick Versley | Veronika Vincze | Marcin Woliński | Alina Wróblewska | Eric Villemonte de la Clergerie
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf
SZTE-NLP: Sentiment Detection on Twitter Messages
Viktor Hangya | Gábor Berend | Richárd Farkas
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf
Knowledge Sources for Constituent Parsing of German, a Morphologically Rich and Less-Configurational Language
Alexander Fraser | Helmut Schmid | Richárd Farkas | Renjing Wang | Hinrich Schütze
Computational Linguistics, Volume 39, Issue 1 - March 2013

pdf
Full-coverage Identification of English Light Verb Constructions
István Nagy T. | Veronika Vincze | Richárd Farkas
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf
Keyphrase-Driven Document Visualization Tool
Gábor Berend | Richárd Farkas
The Companion Volume of the Proceedings of IJCNLP 2013: System Demonstrations

pdf
Identifying English and Hungarian Light Verb Constructions: A Contrastive Approach
Veronika Vincze | István Nagy T. | Richárd Farkas
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf
Forest Reranking through Subtree Ranking
Richárd Farkas | Helmut Schmid
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf
Cross-Genre and Cross-Domain Detection of Semantic Uncertainty
György Szarvas | Veronika Vincze | Richárd Farkas | György Móra | Iryna Gurevych
Computational Linguistics, Volume 38, Issue 2 - June 2012

pdf
Stacking of Dependency and Phrase Structure Parsers
Richárd Farkas | Bernd Bohnet
Proceedings of COLING 2012

pdf
Data-driven Dependency Parsing With Empty Heads
Wolfgang Seeker | Richárd Farkas | Bernd Bohnet | Helmut Schmid | Jonas Kuhn
Proceedings of COLING 2012: Posters

pdf
Data-driven Multilingual Coreference Resolution using Resolver Stacking
Anders Björkelund | Richárd Farkas
Joint Conference on EMNLP and CoNLL - Shared Task

pdf
Dependency Parsing of Hungarian: Baseline Results and Challenges
Richárd Farkas | Veronika Vincze | Helmut Schmid
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf
Features for Phrase-Structure Reranking from Dependency Parses
Richárd Farkas | Bernd Bohnet | Helmut Schmid
Proceedings of the 12th International Conference on Parsing Technologies

pdf
Learning Local Content Shift Detectors from Document-level Information
Richárd Farkas
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task
Richárd Farkas | Veronika Vincze | György Szarvas | György Móra | János Csirik
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

pdf bib
The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text
Richárd Farkas | Veronika Vincze | György Móra | János Csirik | György Szarvas
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

pdf
SZTERGAK : Feature Engineering for Keyphrase Extraction
Gábor Berend | Richárd Farkas
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf
Exploring ways beyond the simple supervised learning approach for biological event extraction
György Móra | Richárd Farkas | György Szarvas | Zsolt Molnár
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task

pdf bib
Researcher affiliation extraction from homepages
István Nagy | Richárd Farkas | Márk Jelasity
Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries (NLPIR4DL)

2008

pdf
Hungarian Word-Sense Disambiguated Corpus
Veronika Vincze | György Szarvas | Attila Almási | Dóra Szauter | Róbert Ormándi | Richárd Farkas | Csaba Hatvani | János Csirik
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

To create the first Hungarian WSD corpus, 39 suitable word form samples were selected for the purpose of word sense disambiguation. Among others, selection criteria required the given word form to be frequent in Hungarian language usage, and to have more than one sense considered frequent in usage. HNC and its Heti Világgazdaság subcorpus provided the basis for corpus text selection. This way, each sample has a relevant context (whole article), and information on the lemma, POS-tagging and automatic tokenization is also available. When planning the corpus, 300-500 samples of each word form were to be annotated. This size makes it possible that the subcorpora prepared for the individual word forms can be compared to data available for other languages. However, the finalized database also contains unannotated samples and samples with single annotation, which were annotated only by one of the linguists. The corpus follows the ACL’s SensEval/SemEval WSD tasks format. The first version of the corpus was developed within the scope of the project titled The construction Hungarian WordNet Ontology and its application in Information Extraction Systems (Hatvani et al., 2007). The corpus “ for research and educational purposes” is available and can be downloaded free of charge.

pdf
The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts
György Szarvas | Veronika Vincze | Richárd Farkas | János Csirik
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing

2007

pdf
GYDER: Maxent Metonymy Resolution
Richárd Farkas | Eszter Simon | György Szarvas | Dániel Varga
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf
A highly accurate Named Entity corpus for Hungarian
György Szarvas | Richárd Farkas | László Felföldi | András Kocsor | János Csirik
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

A highly accurate Named Entity (NE) corpus for Hungarian that is publicly available for research purposes is introduced in the paper, along with its main properties. The results of experiments that apply various Machine Learning models and classifier combination schemes are also presented to serve as a benchmark for further research based on the corpus. The data is a segment of the Szeged Corpus (Csendes et al., 2004), consisting of short business news articles collected from MTI (Hungarian News Agency, www.mti.hu). The annotation procedure was carried out paying special attention to annotation accuracy. The corpus went through a parallel annotation phase done by two annotators, resulting in a tagging with inter-annotator agreement rate of 99.89%. Controversial taggings were collected and discussed by the two annotators and a linguist with several years of experience in corpus annotation. These examples were tagged following the decision they made together, and finally all entities that had suspicious or dubious annotations were collected and checked for consistency. We consider the result of this correcting process virtually be free of errors. Our best performing Named Entity Recognizer (NER) model attained an accuracy of 92.86% F measure on the corpus.