Anouck Braggaar


2022

pdf
A reproduction study of methods for evaluating dialogue system output: Replicating Santhanam and Shaikh (2019)
Anouck Braggaar | Frédéric Tomas | Peter Blomsma | Saar Hommes | Nadine Braun | Emiel van Miltenburg | Chris van der Lee | Martijn Goudbeek | Emiel Krahmer
Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges

In this paper, we describe our reproduction ef- fort of the paper: Towards Best Experiment Design for Evaluating Dialogue System Output by Santhanam and Shaikh (2019) for the 2022 ReproGen shared task. We aim to produce the same results, using different human evaluators, and a different implementation of the automatic metrics used in the original paper. Although overall the study posed some challenges to re- produce (e.g. difficulties with reproduction of automatic metrics and statistics), in the end we did find that the results generally replicate the findings of Santhanam and Shaikh (2019) and seem to follow similar trends.

2021

pdf
Challenges in Annotating and Parsing Spoken, Code-switched, Frisian-Dutch Data
Anouck Braggaar | Rob van der Goot
Proceedings of the Second Workshop on Domain Adaptation for NLP

While high performance have been obtained for high-resource languages, performance on low-resource languages lags behind. In this paper we focus on the parsing of the low-resource language Frisian. We use a sample of code-switched, spontaneously spoken data, which proves to be a challenging setup. We propose to train a parser specifically tailored towards the target domain, by selecting instances from multiple treebanks. Specifically, we use Latent Dirichlet Allocation (LDA), with word and character N-grams. We use a deep biaffine parser initialized with mBERT. The best single source treebank (nl_alpino) resulted in an LAS of 54.7 whereas our data selection outperformed the single best transfer treebank and led to 55.6 LAS on the test data. Additional experiments consisted of removing diacritics from our Frisian data, creating more similar training data by cropping sentences and running our best model using XLM-R. These experiments did not lead to a better performance.