David Arps


2022

pdf
Probing for Constituency Structure in Neural Language Models
David Arps | Younes Samih | Laura Kallmeyer | Hassan Sajjad
Findings of the Association for Computational Linguistics: EMNLP 2022

In this paper, we investigate to which extent contextual neural language models (LMs) implicitly learn syntactic structure. More concretely, we focus on constituent structure as represented in the Penn Treebank (PTB). Using standard probing techniques based on diagnostic classifiers, we assess the accuracy of representing constituents of different categories within the neuron activations of a LM such as RoBERTa. In order to make sure that our probe focuses on syntactic knowledge and not on implicit semantic generalizations, we also experiment on a PTB version that is obtained by randomly replacing constituents with each other while keeping syntactic structure, i.e., a semantically ill-formed but syntactically well-formed version of the PTB. We find that 4 pretrained transfomer LMs obtain high performance on our probing tasks even on manipulated data, suggesting that semantic and syntactic knowledge in their representations can be separated and that constituency information is in fact learned by the LM. Moreover, we show that a complete constituency tree can be linearly separated from LM representations.

pdf
HHUplexity at Text Complexity DE Challenge 2022
David Arps | Jan Kels | Florian Krämer | Yunus Renz | Regina Stodden | Wiebke Petersen
Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text

In this paper, we describe our submission to the ‘Text Complexity DE Challenge 2022’ shared task on predicting the complexity of German sentences. We compare performance of different feature-based regression architectures and transformer language models. Our best candidate is a fine-tuned German Distilbert model that ignores linguistic features of the sentences. Our model ranks 7th place in the shared task.

2018

pdf
A Parser for LTAG and Frame Semantics
David Arps | Simon Petitjean
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)