Andreas Nürnberger


2018

pdf bib
Contemporary Amharic Corpus: Automatically Morpho-Syntactically Tagged Amharic Corpus
Andargachew Mekonnen Gezmu | Binyam Ephrem Seyoum | Michael Gasser | Andreas Nürnberger
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

We introduced the contemporary Amharic corpus, which is automatically tagged for morpho-syntactic information. Texts are collected from 25,199 documents from different domains and about 24 million orthographic words are tokenized. Since it is partly a web corpus, we made some automatic spelling error correction. We have also modified the existing morphological analyzer, HornMorpho, to use it for the automatic tagging.

pdf bib
Portable Spelling Corrector for a Less-Resourced Language: Amharic
Andargachew Mekonnen Gezmu | Andreas Nürnberger | Binyam Ephrem Seyoum
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2008

pdf bib
A Comparative Study on Language Identification Methods
Lena Grothe | Ernesto William De Luca | Andreas Nürnberger
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we present two experiments conducted for comparison of different language identification algorithms. Short words-, frequent words- and n-gram-based approaches are considered and combined with the Ad-Hoc Ranking classification method. The language identification process can be subdivided into two main steps: first a document model is generated for the document and a language model for the language; second the language of the document is determined on the basis of the language model and is added to the document as additional information. In this work we present our evaluation results and discuss the importance of a dynamic value for the out-of-place measure.

pdf bib
Arabic/English word translation disambiguation using parallel corpora and matching schemes
Farag Ahmed | Andreas Nürnberger
Proceedings of the 12th Annual conference of the European Association for Machine Translation

2006

pdf bib
Rebuilding Lexical Resources for Information Retrieval using Sense Folder Detection and Merging Methods
Ernesto William De Luca | Andreas Nürnberger
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we discuss the problem of sense disambiguation using lexical resources like ontologies or thesauri with a focus on the application of sense detection and merging methods in information retrieval systems. For an information retrieval task it is important to detect the meaning of a query word for retrieving the related relevant documents. In order to recognize the meaning of a search word, lexical resources, like WordNet, can be used for word sense disambiguation. But, analyzing the WordNet structure, we see that this ontology is fraught with different problems. The too fine grained distinction between word senses, for example, is unfavorable for a usage in information retrieval. We describe related problems and present four implemented online methods to merge SynSets based on relations like hypernyms and hyponyms, and further context information like glosses and domain. Afterwards we show a first evaluation of our approach, compare the different merging methods and discuss briefly future work.