Rafael Ehren


2020

pdf bib
Proceedings of the 19th International Workshop on Treebanks and Linguistic Theories
Kilian Evang | Laura Kallmeyer | Rafael Ehren | Simon Petitjean | Esther Seyffarth | Djamé Seddah
Proceedings of the 19th International Workshop on Treebanks and Linguistic Theories

pdf bib
Supervised Disambiguation of German Verbal Idioms with a BiLSTM Architecture
Rafael Ehren | Timm Lichte | Laura Kallmeyer | Jakub Waszczuk
Proceedings of the Second Workshop on Figurative Language Processing

Supervised disambiguation of verbal idioms (VID) poses special demands on the quality and quantity of the annotated data used for learning and evaluation. In this paper, we present a new VID corpus for German and perform a series of VID disambiguation experiments on it. Our best classifier, based on a neural architecture, yields an error reduction across VIDs of 57% in terms of accuracy compared to a simple majority baseline.

2019

pdf bib
A Neural Graph-based Approach to Verbal MWE Identification
Jakub Waszczuk | Rafael Ehren | Regina Stodden | Laura Kallmeyer
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

We propose to tackle the problem of verbal multiword expression (VMWE) identification using a neural graph parsing-based approach. Our solution involves encoding VMWE annotations as labellings of dependency trees and, subsequently, applying a neural network to model the probabilities of different labellings. This strategy can be particularly effective when applied to discontinuous VMWEs and, thanks to dense, pre-trained word vector representations, VMWEs unseen during training. Evaluation of our approach on three PARSEME datasets (German, French, and Polish) shows that it allows to achieve performance on par with the previous state-of-the-art (Al Saied et al., 2018).

2018

pdf bib
Mumpitz at PARSEME Shared Task 2018: A Bidirectional LSTM for the Identification of Verbal Multiword Expressions
Rafael Ehren | Timm Lichte | Younes Samih
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

In this paper, we describe Mumpitz, the system we submitted to the PARSEME Shared task on automatic identification of verbal multiword expressions (VMWEs). Mumpitz consists of a Bidirectional Recurrent Neural Network (BRNN) with Long Short-Term Memory (LSTM) units and a heuristic that leverages the dependency information provided in the PARSEME corpus data to differentiate VMWEs in a sentence. We submitted results for seven languages in the closed track of the task and for one language in the open track. For the open track we used the same system, but with pretrained instead of randomly initialized word embeddings to improve the system performance.

2017

pdf bib
Literal or idiomatic? Identifying the reading of single occurrences of German multiword expressions using word embeddings
Rafael Ehren
Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics

Non-compositional multiword expressions (MWEs) still pose serious issues for a variety of natural language processing tasks and their ubiquity makes it impossible to get around methods which automatically identify these kind of MWEs. The method presented in this paper was inspired by Sporleder and Li (2009) and is able to discriminate between the literal and non-literal use of an MWE in an unsupervised way. It is based on the assumption that words in a text form cohesive units. If the cohesion of these units is weakened by an expression, it is classified as literal, and otherwise as idiomatic. While Sporleder an Li used Normalized Google Distance to modell semantic similarity, the present work examines the use of a variety of different word embeddings.