Georg Heigold


2018

pdf
How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse?
Georg Heigold | Stalin Varanasi | Günter Neumann | Josef van Genabith
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

2017

pdf
An Extensive Empirical Evaluation of Character-Based Morphological Tagging for 14 Languages
Georg Heigold | Guenter Neumann | Josef van Genabith
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

This paper investigates neural character-based morphological tagging for languages with complex morphology and large tag sets. Character-based approaches are attractive as they can handle rarely- and unseen words gracefully. We evaluate on 14 languages and observe consistent gains over a state-of-the-art morphological tagger across all languages except for English and French, where we match the state-of-the-art. We compare two architectures for computing character-based word vectors using recurrent (RNN) and convolutional (CNN) nets. We show that the CNN based approach performs slightly worse and less consistently than the RNN based approach. Small but systematic gains are observed when combining the two architectures by ensembling.

pdf bib
Common Round: Application of Language Technologies to Large-Scale Web Debates
Hans Uszkoreit | Aleksandra Gabryszak | Leonhard Hennig | Jörg Steffen | Renlong Ai | Stephan Busemann | Jon Dehdari | Josef van Genabith | Georg Heigold | Nils Rethmeier | Raphael Rubino | Sven Schmeier | Philippe Thomas | He Wang | Feiyu Xu
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

Web debates play an important role in enabling broad participation of constituencies in social, political and economic decision-taking. However, it is challenging to organize, structure, and navigate a vast number of diverse argumentations and comments collected from many participants over a long time period. In this paper we demonstrate Common Round, a next generation platform for large-scale web debates, which provides functions for eliciting the semantic content and structures from the contributions of participants. In particular, Common Round applies language technologies for the extraction of semantic essence from textual input, aggregation of the formulated opinions and arguments. The platform also provides a cross-lingual access to debates using machine translation.

pdf
Cross-lingual Character-Level Neural Morphological Tagging
Ryan Cotterell | Georg Heigold
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Even for common NLP tasks, sufficient supervision is not available in many languages – morphological tagging is no exception. In the work presented here, we explore a transfer learning scheme, whereby we train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together. Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones.