Gabor Melli


2020

pdf
GM-RKB WikiText Error Correction Task and Baselines
Gabor Melli | Abdelrhman Eldallal | Bassim Lazem | Olga Moreira
Proceedings of the Twelfth Language Resources and Evaluation Conference

We introduce the GM-RKB WikiText Error Correction Task for the automatic detection and correction of typographical errors in WikiText annotated pages. The included corpus is based on a snapshot of the GM-RKB domain-specific semantic wiki consisting of a large collection of concepts, personages, and publications primary centered on data mining and machine learning research topics. Numerous Wikipedia pages were also included as additional training data in the task’s evaluation process. The corpus was then automatically updated to synthetically include realistic errors to produce a training and evaluation ground truth comparison. We designed and evaluated two supervised baseline WikiFixer error correction methods: (1) a naive approach based on a maximum likelihood character-level language model; (2) and an advanced model based on a sequence-to-sequence (seq2seq) neural network architecture. Both error correction models operated at a character level. When compared against an off-the-shelf word-level spell checker these methods showed a significant improvement in the task’s performance – with the seq2seq-based model correcting a higher number of errors than it introduced. Finally, we published our data and code.

2014

pdf bib
Proceedings of TextGraphs-9: the workshop on Graph-based Methods for Natural Language Processing
V.G.Vinod Vydiswaran | Amarnag Subramanya | Gabor Melli | Irina Matveeva
Proceedings of TextGraphs-9: the workshop on Graph-based Methods for Natural Language Processing

2013

pdf bib
Proceedings of TextGraphs-8 Graph-based Methods for Natural Language Processing
Zornitsa Kozareva | Irina Matveeva | Gabor Melli | Vivi Nastase
Proceedings of TextGraphs-8 Graph-based Methods for Natural Language Processing

2012

pdf
Identifying Untyped Relation Mentions in a Corpus given an Ontology
Gabor Melli
Workshop Proceedings of TextGraphs-7: Graph-based Methods for Natural Language Processing

2010

pdf
Concept Mentions within KDD-2009 Abstracts (kdd09cma1) Linked to a KDD Ontology (kddo1)
Gabor Melli
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We introduce the kddo1 ontology and semantically annotated kdd09cma1 corpus from the field of knowledge discovery in database (KDD) research. The corpus is based on the abstracts for the papers accepted into the KDD-2009 conference. Each abstract has its concept mentions identified and, where possible, linked to the appropriate concept in the ontology. The ontology is based on a human generated and readable semantic wiki focused on concepts and relationships for the domain along with other related topics, papers and researchers from information sciences. To our knowledge this is the first ontology and interlinked corpus for a subdiscipline within computing science. The dataset enables the evaluation of supervised approaches to semantic annotation of documents that contain a large number of high-level concepts relative the number of named entity mentions. We plan to continue to evolve the ontology based on the discovered relations within the corpus and to extend the corpus to cover other research paper abstracts from the domain. Both resources are publicly available at http://www.gabormelli.com/Projects/kdd/data/.