Andreas van Cranenburgh


A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch Literature
Andreas van Cranenburgh | Esther Ploeger | Frank van den Berg | Remi Thüss
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference

We introduce a modular, hybrid coreference resolution system that extends a rule-based baseline with three neural classifiers for the subtasks mention detection, mention attributes (gender, animacy, number), and pronoun resolution. The classifiers substantially increase coreference performance in our experiments with Dutch literature across all metrics on the development set: mention detection, LEA, CoNLL, and especially pronoun accuracy. However, on the test set, the best results are obtained with rule-based pronoun resolution. This inconsistent result highlights that the rule-based system is still a strong baseline, and more work is needed to improve pronoun resolution robustly for this dataset. While end-to-end neural systems require no feature engineering and achieve excellent performance in standard benchmarks with large training sets, our simple hybrid system scales well to long document coreference (>10k words) and attains superior results in our experiments on literature.

Stylometric Literariness Classification: the Case of Stephen King
Andreas van Cranenburgh | Erik Ketzan
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

This paper applies stylometry to quantify the literariness of 73 novels and novellas by American author Stephen King, chosen as an extraordinary case of a writer who has been dubbed both “high” and “low” in literariness in critical reception. We operationalize literariness using a measure of stylistic distance (Cosine Delta) based on the 1000 most frequent words in two bespoke comparison corpora used as proxies for literariness: one of popular genre fiction, another of National Book Award-winning authors. We report that a supervised model is highly effective in distinguishing the two categories, with 94.6% macro average in a binary classification. We define two subsets of texts by King—“high” and “low” literariness works as suggested by critics and ourselves—and find that a predictive model does identify King’s Dark Tower series and novels such as Dolores Claiborne as among his most “literary” texts, consistent with critical reception, which has also ascribed postmodern qualities to the Dark Tower novels. Our results demonstrate the efficacy of Cosine Delta-based stylometry in quantifying the literariness of texts, while also highlighting the methodological challenges of literariness, especially in the case of Stephen King. The code and data to reproduce our results are available at


What’s so special about BERT’s layers? A closer look at the NLP pipeline in monolingual and multilingual models
Wietse de Vries | Andreas van Cranenburgh | Malvina Nissim
Findings of the Association for Computational Linguistics: EMNLP 2020

Peeking into the inner workings of BERT has shown that its layers resemble the classical NLP pipeline, with progressively more complex tasks being concentrated in later layers. To investigate to what extent these results also hold for a language other than English, we probe a Dutch BERT-based model and the multilingual BERT model for Dutch NLP tasks. In addition, through a deeper analysis of part-of-speech tagging, we show that also within a given task, information is spread over different parts of the network and the pipeline might not be as neat as it seems. Each layer has different specialisations, so that it may be more useful to combine information from different layers, instead of selecting a single one based on the best overall performance.

A Benchmark of Rule-Based and Neural Coreference Resolution in Dutch Novels and News
Corbèn Poot | Andreas van Cranenburgh
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference

We evaluate a rule-based (Lee et al., 2013) and neural (Lee et al., 2018) coreference system on Dutch datasets of two domains: literary novels and news/Wikipedia text. The results provide insight into the relative strengths of data-driven and knowledge-driven systems, as well as the influence of domain, document length, and annotation schemes. The neural system performs best on news/Wikipedia text, while the rule-based system performs best on literature. The neural system shows weaknesses with limited training data and long documents, while the rule-based system is affected by annotation differences. The code and models used in this paper are available at

Results of a Single Blind Literary Taste Test with Short Anonymized Novel Fragments
Andreas van Cranenburgh | Corina Koolen
Proceedings of the The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

It is an open question to what extent perceptions of literary quality are derived from text-intrinsic versus social factors. While supervised models can predict literary quality ratings from textual factors quite successfully, as shown in the Riddle of Literary Quality project (Koolen et al., 2020), this does not prove that social factors are not important, nor can we assume that readers make judgments on literary quality in the same way and based on the same information as machine learning models. We report the results of a pilot study to gauge the effect of textual features on literary ratings of Dutch-language novels by participants in a controlled experiment with 48 participants. In an exploratory analysis, we compare the ratings to those from the large reader survey of the Riddle in which social factors were not excluded, and to machine learning predictions of those literary ratings. We find moderate to strong correlations of questionnaire ratings with the survey ratings, but the predictions are closer to the survey ratings. Code and data:

Embarrassingly Simple Unsupervised Aspect Extraction
Stéphan Tulkens | Andreas van Cranenburgh
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present a simple but effective method for aspect identification in sentiment analysis. Our unsupervised method only requires word embeddings and a POS tagger, and is therefore straightforward to apply to new domains and languages. We introduce Contrastive Attention (CAt), a novel single-head attention mechanism based on an RBF kernel, which gives a considerable boost in performance and makes the model interpretable. Previous work relied on syntactic features and complex neural models. We show that given the simplicity of current benchmark datasets for aspect extraction, such complex models are not needed. The code to reproduce the experiments reported in this paper is available at


Active DOP: A constituency treebank annotation tool with online learning
Andreas van Cranenburgh
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

We present a language-independent treebank annotation tool supporting rich annotations with discontinuous constituents and function tags. Candidate analyses are generated by an exemplar-based parsing model that immediately learns from each new annotated sentence during annotation. This makes it suitable for situations in which only a limited seed treebank is available, or a radically different domain is being annotated. The tool offers the possibility to experiment with and evaluate active learning methods to speed up annotation in a naturalistic setting, i.e., measuring actual annotation costs and tracking specific user interactions. The code is made available under the GNU GPL license at

Cliche Expressions in Literary and Genre Novels
Andreas van Cranenburgh
Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Should writers “avoid clichés like the plague”? Clichés are said to be a prominent characteristic of “low brow” literature, and conversely, a negative marker of “high brow” literature. Clichés may concern the storyline, the characters, or the style of writing. We focus on cliché expressions, ready-made stock phrases which can be taken as a sign of uncreative writing. We present a corpus study in which we examine to what extent cliché expressions can be attested in a corpus of various kinds of contemporary fiction, based on a large, curated lexicon of cliché expressions. The results show to what extent the negative view on clichés is supported by data: we find a significant negative correlation of -0.48 between cliché density and literary ratings of texts. We also investigate interactions with genre and characterize the language of clichés with several basic textual features. Code used for this paper is available at

German and French Neural Supertagging Experiments for LTAG Parsing
Tatiana Bladier | Andreas van Cranenburgh | Younes Samih | Laura Kallmeyer
Proceedings of ACL 2018, Student Research Workshop

We present ongoing work on data-driven parsing of German and French with Lexicalized Tree Adjoining Grammars. We use a supertagging approach combined with deep learning. We show the challenges of extracting LTAG supertags from the French Treebank, introduce the use of left- and right-sister-adjunction, present a neural architecture for the supertagger, and report experiments of n-best supertagging for French and German.


pdf bib
These are not the Stereotypes You are Looking For: Bias and Fairness in Authorial Gender Attribution
Corina Koolen | Andreas van Cranenburgh
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing

Stylometric and text categorization results show that author gender can be discerned in texts with relatively high accuracy. However, it is difficult to explain what gives rise to these results and there are many possible confounding factors, such as the domain, genre, and target audience of a text. More fundamentally, such classification efforts risk invoking stereotyping and essentialism. We explore this issue in two datasets of Dutch literary novels, using commonly used descriptive (LIWC, topic modeling) and predictive (machine learning) methods. Our results show the importance of controlling for variables in the corpus and we argue for taking care not to overgeneralize from the results.

A Data-Oriented Model of Literary Language
Andreas van Cranenburgh | Rens Bod
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We consider the task of predicting how literary a text is, with a gold standard from human ratings. Aside from a standard bigram baseline, we apply rich syntactic tree fragments, mined from the training set, and a series of hand-picked features. Our model is the first to distinguish degrees of highly and less literary novels using a variety of lexical and syntactic features, and explains 76.0 % of the variation in literary ratings.


Identifying Literary Texts with Bigrams
Andreas van Cranenburgh | Corina Koolen
Proceedings of the Fourth Workshop on Computational Linguistics for Literature

pdf bib
Multiword Expression Identification with Recurring Tree Fragments and Association Measures
Federico Sangati | Andreas van Cranenburgh
Proceedings of the 11th Workshop on Multiword Expressions


From high heels to weed attics: a syntactic investigation of chick lit and literature
Kim Jautze | Corina Koolen | Andreas van Cranenburgh | Hayco de Jong
Proceedings of the Workshop on Computational Linguistics for Literature

pdf bib
Discontinuous Parsing with an Efficient and Accurate DOP Model
Andreas van Cranenburgh | Rens Bod
Proceedings of the 13th International Conference on Parsing Technologies (IWPT 2013)


Building a Corpus of Indefinite Uses Annotated with Fine-grained Semantic Functions
Maria Aloni | Andreas van Cranenburgh | Raquel Fernández | Marta Sznajder
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Natural languages possess a wealth of indefinite forms that typically differ in distribution and interpretation. Although formal semanticists have strived to develop precise meaning representations for different indefinite functions, to date there has hardly been any corpus work on the topic. In this paper, we present the results of a small corpus study where English indefinite forms `any' and `some' were labelled with fine-grained semantic functions well-motivated by typological studies. We developed annotation guidelines that could be used by non-expert annotators and calculated inter-annotator agreement amongst several coders. The results show that the annotation task is hard, with agreement scores ranging from 52% to 62% depending on the number of functions considered, but also that each of the independent annotations is in accordance with theoretical predictions regarding the possible distributions of indefinite functions. The resulting annotated corpus is available upon request and can be accessed through a searchable online database.

Literary authorship attribution with phrase-structure fragments
Andreas van Cranenburgh
Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature

Efficient parsing with Linear Context-Free Rewriting Systems
Andreas van Cranenburgh
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics


Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar
Andreas van Cranenburgh | Remko Scha | Federico Sangati
Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages