Anne Göhring

Also published as: Anne Goehring


Animacy Denoting German Nouns: Annotation and Classification
Manfred Klenner | Anne Göhring
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we introduce a gold standard for animacy detection comprising almost 14,500 German nouns that might be used to denote either animate entities or non-animate entities. We present inter-annotator agreement of our crowd-sourced seed annotations (9,000 nouns) and discuss the results of machine learning models applied to this data.

Polar Quantification of Actor Noun Phrases for German
Anne Göhring | Manfred Klenner
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we discuss work that strives to measure the degree of negativity - the negative polar load - of noun phrases, especially those denoting actors. Since no gold standard data is available for German for this quantification task, we generated a silver standard and used it to fine-tune a BERT-based intensity regressor. We evaluated the quality of the silver standard empirically and found that our lexicon-based quantification metric showed a strong correlation with human annotators.

Semantic Role Labeling for Sentiment Inference: A Case Study
Manfred Klenner | Anne Göhring
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)


DeInStance: Creating and Evaluating a German Corpus for Fine-Grained Inferred Stance Detection
Anne Göhring | Manfred Klenner | Sophia Conrad
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

Getting Hold of Villains and other Rogues
Manfred Klenner | Anne Göhring | Sophia Conrad
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

In this paper, we introduce the first corpus specifying negative entities within sentences. We discuss indicators for their presence, namely particular verbs, but also the linguistic conditions when their prediction should be suppressed. We further show that a fine-tuned Bert-based baseline model outperforms an over-generating rule-based approach which is not aware of these further restrictions. If a perfect filter were applied, both would be on par.


pdf bib
Encoder-Decoder Methods for Text Normalization
Massimo Lusetti | Tatyana Ruzsics | Anne Göhring | Tanja Samardžić | Elisabeth Stark
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

Text normalization is the task of mapping non-canonical language, typical of speech transcription and computer-mediated communication, to a standardized writing. It is an up-stream task necessary to enable the subsequent direct employment of standard natural language processing tools and indispensable for languages such as Swiss German, with strong regional variation and no written standard. Text normalization has been addressed with a variety of methods, most successfully with character-level statistical machine translation (CSMT). In the meantime, machine translation has changed and the new methods, known as neural encoder-decoder (ED) models, resulted in remarkable improvements. Text normalization, however, has not yet followed. A number of neural methods have been tried, but CSMT remains the state-of-the-art. In this work, we normalize Swiss German WhatsApp messages using the ED framework. We exploit the flexibility of this framework, which allows us to learn from the same training data in different ways. In particular, we modify the decoding stage of a plain ED model to include target-side language models operating at different levels of granularity: characters and words. Our systematic comparison shows that our approach results in an improvement over the CSMT state-of-the-art.


Building a Spanish-German Dictionary for Hybrid MT
Anne Göhring
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra)


Machine Learning Disambiguation of Quechua Verb Morphology
Annette Rios Gonzales | Anne Göhring
Proceedings of the Second Workshop on Hybrid Approaches to Translation


A tree is a Baum is an árbol is a sach’a: Creating a trilingual treebank
Annette Rios | Anne Göhring
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the process of constructing a trilingual parallel treebank. While for two of the involved languages, Spanish and German, there are already corpora with well-established annotation schemes available, this is not the case with the third language: Cuzco Quechua (ISO 639-3:quz), a low-resourced, non-standardized language for which we had to define a linguistically plausible annotation scheme first.


Le corpus Text+Berg Une ressource parallèle alpin français-allemand (The Text+Berg Corpus An Alpine French-German Parallel Resource)
Anne Göhring | Martin Volk
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Cet article présente un corpus parallèle français-allemand de plus de 4 millions de mots issu de la numérisation d’un corpus alpin multilingue. Ce corpus est une précieuse ressource pour de nombreuses études de linguistique comparée et du patrimoine culturel ainsi que pour le développement d’un système statistique de traduction automatique dans un domaine spécifique. Nous avons annoté un échantillon de ce corpus parallèle et aligné les structures arborées au niveau des mots, des constituants et des phrases. Cet “alpine treebank” est le premier corpus arboré parallèle français-allemand de haute qualité (manuellement contrôlé), de libre accès et dans un domaine et un genre nouveau : le récit d’alpinisme.


Combining Parallel Treebanks and Geo-Tagging
Martin Volk | Anne Goehring | Torsten Marek
Proceedings of the Fourth Linguistic Annotation Workshop