Daniël de Kok


2020

In this paper, we introduce the SORTS Subject-Object Resolution Test Suite of German minimal sentence pairs for model introspection. The full test suite consists of 18,502 transitive clauses with manual annotations of 8 word order patterns, 5 morphological and syntactic and 11 semantic property classes. The test suite has been constructed such that sentences are minimal pairs with respect to a property class. Each property has been selected with a particular focus on its effect on subject-object resolution, the second-most error-prone task within syntactic parsing of German after prepositional phrase attachment (Fischer et al., 2019). The size and detail of annotations make the test suite a valuable resource for natural language processing applications with syntactic and semantic tasks. We use dependency parsing to demonstrate how the test suite allows insights into the process of subject-object resolution. Based on the test suite annotations, word order and case syncretism can be identified as most important factors that affect subject-object resolution.

2019

Composition models of distributional semantics are used to construct phrase representations from the representations of their words. Composition models are typically situated on two ends of a spectrum. They either have a small number of parameters but compose all phrases in the same way, or they perform word-specific compositions at the cost of a far larger number of parameters. In this paper we propose transformation weighting (TransWeight), a composition model that consistently outperforms existing models on nominal compounds, adjective-noun phrases, and adverb-adjective phrases in English, German, and Dutch. TransWeight drastically reduces the number of parameters needed compared with the best model in the literature by composing similar words in the same way.

2017

Prepostitional phrase (PP) attachment is a well known challenge to parsing. In this paper, we combine the insights of different works, namely: (1) treating PP attachment as a classification task with an arbitrary number of attachment candidates; (2) using auxiliary distributions to augment the data beyond the hand-annotated training set; (3) using topological fields to get information about the distribution of PP attachment throughout clauses and (4) using state-of-the-art techniques such as word embeddings and neural networks. We show that jointly using these techniques leads to substantial improvements. We also conduct a qualitative analysis to gauge where the ceiling of the task is in a realistic setup.

2016

2011

2010

2009