Verena Henrich

2017

pdf bib abs
Audience Segmentation in Social Media
Verena Henrich | Alexander Lang
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

Understanding the social media audience is becoming increasingly important for social media analysis. This paper presents an approach that detects various audience attributes, including author location, demographics, behavior and interests. It works both for a variety of social media sources and for multiple languages. The approach has been implemented within IBM Watson Analytics for Social Media and creates author profiles for more than 300 different analysis domains every day.

2016

pdf bib
Letter Sequence Labeling for Compound Splitting
Jianqiang Ma | Verena Henrich | Erhard Hinrichs
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

2014

pdf bib abs
How to Tell a Schneemann from a Milchmann: An Annotation Scheme for Compound-Internal Relations
Corina Dima | Verena Henrich | Erhard Hinrichs | Christina Hoppermann
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a language-independent annotation scheme for the semantic relations that link the constituents of noun-noun compounds, such as Schneemann ‘snow man’ or Milchmann ‘milk man’. The annotation scheme is hybrid in the sense that it assigns each compound a two-place label consisting of a semantic property and a prepositional paraphrase. The resulting inventory combines the insights of previous annotation schemes that rely exclusively on either semantic properties or prepositions, thus avoiding the known weaknesses that result from using only one of the two label types. The proposed annotation scheme has been used to annotate a set of 5112 German noun-noun compounds. A release of the dataset is currently being prepared and will be made available via the CLARIN Center Tübingen. In addition to the presentation of the hybrid annotation scheme, the paper also reports on an inter-annotator agreement study that has resulted in a substantial agreement among annotators.

pdf bib
Aligning Word Senses in GermaNet and the DWDS Dictionary of the German Language
Verena Henrich | Erhard Hinrichs | Reinhild Barkey
Proceedings of the Seventh Global Wordnet Conference

2012

pdf bib
Using Synthetic Compounds for Word Sense Discrimination
Yannick Versley | Verena Henrich
Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages

pdf bib
WebCAGe – A Web-Harvested Corpus Annotated with GermaNet Senses
Verena Henrich | Erhard Hinrichs | Tatiana Vodolazova
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib abs
A Comparative Evaluation of Word Sense Disambiguation Algorithms for German
Verena Henrich | Erhard Hinrichs
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The present paper explores a wide range of word sense disambiguation (WSD) algorithms for German. These WSD algorithms are based on a suite of semantic relatedness measures, including path-based, information-content-based, and gloss-based methods. Since the individual algorithms produce diverse results in terms of precision and thus complement each other well in terms of coverage, a set of combined algorithms is investigated and compared in performance to the individual algorithms. Among the single algorithms considered, a word overlap method derived from the Lesk algorithm that uses Wiktionary glosses and GermaNet lexical fields yields the best F-score of 56.36. This result is outperformed by a combined WSD algorithm that uses weighted majority voting and obtains an F-score of 63.59. The WSD experiments utilize the German wordnet GermaNet as a sense inventory as well as WebCAGe (short for: Web-Harvested Corpus Annotated with GermaNet Senses), a newly constructed, sense-annotated corpus for this language. The WSD experiments also confirm that WSD performance is lower for words with fine-grained sense distinctions compared to words with coarse-grained senses.

This paper presents the system architecture as well as the underlying workflow of the Extensible Repository System of Digital Objects (ERDO) which has been developed for the sustainable archiving of language resources within the Tübingen CLARIN-D project. In contrast to other approaches focusing on archiving experts, the described workflow can be used by researchers without required knowledge in the field of long-term storage for transferring data from their local file systems into a persistent repository.

2011

pdf bib
Determining Immediate Constituents of Compounds in GermaNet
Verena Henrich | Erhard Hinrichs
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf bib abs
GernEdiT - The GermaNet Editing Tool
Verena Henrich | Erhard Hinrichs
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper introduces GernEdiT (short for: GermaNet Editing Tool), a new graphical user interface for the lexicographers and developers of GermaNet, the German version of the Princeton WordNet. GermaNet is a lexical-semantic net that relates German nouns, verbs, and adjectives. Traditionally, lexicographic work for extending the coverage of GermaNet utilized the Princeton WordNet development environment of lexicographer files. Due to a complex data format and no opportunity of automatic consistency checks, this process was very error prone and time consuming. The GermaNet Editing Tool GernEdiT was developed to overcome these shortcomings. The main purposes of the GernEdiT tool are, besides supporting lexicographers to access, modify, and extend GermaNet data in an easy and adaptive way, as follows: Replace the standard editing tools by a more user-friendly tool, use a relational database as data storage, support export formats in the form of XML, and facilitate internal consistency and correctness of the linguistic resource. All these core functionalities of GernEdiT along with the main aspects of the underlying lexical resource GermaNet and its current database format are presented in this paper.

pdf bib abs
Sustainability of Linguistic Data and Analysis in the Context of a Collaborative eScience Environment
Erhard Hinrichs | Verena Henrich | Thomas Zastrow
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

For researchers, it is especially important that primary research data are preserved and made available on a long-term basis and to a wide variety of researchers. In order to ensure long-term availability of the archived data, it is imperative that the data to be stored is conformant with standardized data formats and best practices followed by the relevant research communities. Storing, managing, and accessing such standard-conformant data requires a repository-based infrastructure. Two projects at the University of Tübingen are realizing a collaborative eScience research environment with the help of eSciDoc for the university that supports long-term preservation of all kinds of data as well as a fine-grained and contextualized data management: the INF project and the BW-eSci(T) project. The task of the infrastructure (INF) project within the collaborative research centre âEmergence of Meaning (SFB 833) is to guarantee the long-term availability of the SFBs data. BW-eSci(T) is a joint project of the University of Tübingen and the Fachinformationszentrums (FIZ) Karlsruhe. The goal of this project is to develop a prototypical eScience research environment for the University of Tübingen.

pdf bib
Standardizing Wordnets in the ISO Standard LMF: Wordnet-LMF for GermaNet
Verena Henrich | Erhard Hinrichs
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)