Esther Ploeger


2023

pdf
The Past, Present, and Future of Typological Databases in NLP
Emi Baylor | Esther Ploeger | Johannes Bjerva
Findings of the Association for Computational Linguistics: EMNLP 2023

Typological information has the potential to be beneficial in the development of NLP models, particularly for low-resource languages. Unfortunately, current large-scale typological databases, notably WALS and Grambank, are inconsistent both with each other and with other sources of typological information, such as linguistic grammars. Some of these inconsistencies stem from coding errors or linguistic variation, but many of the disagreements are due to the discrete categorical nature of these databases. We shed light on this issue by systematically exploring disagreements across typological databases and resources, and their uses in NLP, covering the past and present. We next investigate the future of such work, offering an argument that a continuous view of typological features is clearly beneficial, echoing recommendations from linguistics. We propose that such a view of typology has significant potential in the future, including in language modeling in low-resource scenarios.

2022

pdf
RUG-1-Pegasussers at SemEval-2022 Task 3: Data Generation Methods to Improve Recognizing Appropriate Taxonomic Word Relations
Frank van den Berg | Gijs Danoe | Esther Ploeger | Wessel Poelman | Lukas Edman | Tommaso Caselli
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes our system created for the SemEval 2022 Task 3: Presupposed Taxonomies - Evaluating Neural-network Semantics. This task is focused on correctly recognizing taxonomic word relations in English, French and Italian. We developed various datageneration techniques that expand the originally provided train set and show that all methods increase the performance of modelstrained on these expanded datasets. Our final system outperformed the baseline system from the task organizers by achieving an average macro F1 score of 79.6 on all languages, compared to the baseline’s 67.4.

2021

pdf
A Hybrid Rule-Based and Neural Coreference Resolution System with an Evaluation on Dutch Literature
Andreas van Cranenburgh | Esther Ploeger | Frank van den Berg | Remi Thüss
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference

We introduce a modular, hybrid coreference resolution system that extends a rule-based baseline with three neural classifiers for the subtasks mention detection, mention attributes (gender, animacy, number), and pronoun resolution. The classifiers substantially increase coreference performance in our experiments with Dutch literature across all metrics on the development set: mention detection, LEA, CoNLL, and especially pronoun accuracy. However, on the test set, the best results are obtained with rule-based pronoun resolution. This inconsistent result highlights that the rule-based system is still a strong baseline, and more work is needed to improve pronoun resolution robustly for this dataset. While end-to-end neural systems require no feature engineering and achieve excellent performance in standard benchmarks with large training sets, our simple hybrid system scales well to long document coreference (>10k words) and attains superior results in our experiments on literature.