Vera Aleksić


2012

pdf
Large Scale Lexical Analysis
Gregor Thurmair | Vera Aleksić | Christoph Schwarz
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The following paper presents a lexical analysis component as implemented in the PANACEA project. The goal is to automatically extract lexicon entries from crawled corpora, in an attempt to use corpus-based methods for high-quality linguistic text processing, and to focus on the quality of data without neglecting quantitative aspects. Lexical analysis has the task to assign linguistic information (like: part of speech, inflectional class, gender, subcategorisation frame, semantic properties etc.) to all parts of the input text. If tokens are ambiguous, lexical analysis must provide all possible sets of annotation for later (syntactic) disambiguation, be it tagging, or full parsing. The paper presents an approach for assigning part-of-speech tags for German and English to large input corpora (> 50 mio tokens), providing a workflow which takes as input crawled corpora and provides POS-tagged lemmata ready for lexicon integration. Tools include sentence splitting, lexicon lookup, decomposition, and POS defaulting. Evaluation shows that the overall error rate can be brought down to about 2% if language resources are properly designed. The complete workflow is implemented as a sequence of web services integrated into the PANACEA platform.

pdf
Creating Term and Lexicon Entries from Phrase Tables
Gregor Thurmair | Vera Aleksić
Proceedings of the 16th Annual conference of the European Association for Machine Translation

2011

pdf
Personal Translator at WMT2011
Vera Aleksić | Gregor Thurmair
Proceedings of the Sixth Workshop on Statistical Machine Translation