Agnieszka Falenska

Also published as: Agnieszka Faleńska

2021

pdf abs
Assessing Gender Bias in Wikipedia: Inequalities in Article Titles
Agnieszka Falenska | Özlem Çetinoğlu
Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing

Potential gender biases existing in Wikipedia’s content can contribute to biased behaviors in a variety of downstream NLP systems. Yet, efforts in understanding what inequalities in portraying women and men occur in Wikipedia focused so far only on *biographies*, leaving open the question of how often such harmful patterns occur in other topics. In this paper, we investigate gender-related asymmetries in Wikipedia titles from *all domains*. We assess that for only half of gender-related articles, i.e., articles with words such as *women* or *male* in their titles, symmetrical counterparts describing the same concept for the other gender (and clearly stating it in their titles) exist. Among the remaining imbalanced cases, the vast majority of articles concern sports- and social-related issues. We provide insights on how such asymmetries can influence other Wikipedia components and propose steps towards reducing the frequency of observed patterns.

2020

We present GRAIN-S, a set of manually created syntactic annotations for radio interviews in German. The dataset extends an existing corpus GRAIN and comes with constituency and dependency trees for six interviews. The rare combination of gold- and silver-standard annotation layers coming from GRAIN with high-quality syntax trees can serve as a useful resource for speech- and text-based research. Moreover, since interviews can be put between carefully prepared speech and spontaneous conversational speech, they cover phenomena not seen in traditional newspaper-based treebanks. Therefore, GRAIN-S can contribute to research into techniques for model adaptation and for building more corpus-independent tools. GRAIN-S follows TIGER, one of the established syntactic treebanks of German. We describe the annotation process and discuss decisions necessary to adapt the original TIGER guidelines to the interviews domain. Next, we give details on the conversion from TIGER-style trees to dependency trees. We provide data statistics and demonstrate differences between the new dataset and existing out-of-domain test sets annotated with TIGER syntactic structures. Finally, we provide baseline parsing results for further comparison.

pdf abs
Integrating Graph-Based and Transition-Based Dependency Parsers in the Deep Contextualized Era
Agnieszka Falenska | Anders Björkelund | Jonas Kuhn
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

Graph-based and transition-based dependency parsers used to have different strengths and weaknesses. Therefore, combining the outputs of parsers from both paradigms used to be the standard approach to improve or analyze their performance. However, with the recent adoption of deep contextualized word representations, the chief weakness of graph-based models, i.e., their limited scope of features, has been mitigated. Through two popular combination techniques – blending and stacking – we demonstrate that the remaining diversity in the parsing models is reduced below the level of models trained with different random seeds. Thus, an integration no longer leads to increased accuracy. When both parsers depend on BiLSTMs, the graph-based architecture has a consistent advantage. This advantage stems from globally-trained BiLSTM representations, which capture more distant look-ahead syntactic relations. Such representations can be exploited through multi-task learning, which improves the transition-based parser, especially on treebanks with a high ratio of right-headed dependencies.

2019

pdf abs
IMSurReal: IMS at the Surface Realization Shared Task 2019
Xiang Yu | Agnieszka Falenska | Marina Haid | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)

We introduce the IMS contribution to the Surface Realization Shared Task 2019. Our submission achieves the state-of-the-art performance without using any external resources. The system takes a pipeline approach consisting of five steps: linearization, completion, inflection, contraction, and detokenization. We compare the performance of our linearization algorithm with two external baselines and report results for each step in the pipeline. Furthermore, we perform detailed error analysis revealing correlation between word order freedom and difficulty of the linearization task.

pdf abs
The (Non-)Utility of Structural Features in BiLSTM-based Dependency Parsers
Agnieszka Falenska | Jonas Kuhn
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Classical non-neural dependency parsers put considerable effort on the design of feature functions. Especially, they benefit from information coming from structural features, such as features drawn from neighboring tokens in the dependency tree. In contrast, their BiLSTM-based successors achieve state-of-the-art performance without explicit information about the structural context. In this paper we aim to answer the question: How much structural context are the BiLSTM representations able to capture implicitly? We show that features drawn from partial subtrees become redundant when the BiLSTMs are used. We provide a deep insight into information flow in transition- and graph-based neural architectures to demonstrate where the implicit information comes from when the parsers make their decisions. Finally, with model ablations we demonstrate that the structural context is not only present in the models, but it significantly influences their performance.

pdf
Dependency Length Minimization vs. Word Order Constraints: An Empirical Study On 55 Treebanks
Xiang Yu | Agnieszka Falenska | Jonas Kuhn
Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019)

pdf abs
Head-First Linearization with Tree-Structured Representation
Xiang Yu | Agnieszka Falenska | Ngoc Thang Vu | Jonas Kuhn
Proceedings of the 12th International Conference on Natural Language Generation

We present a dependency tree linearization model with two novel components: (1) a tree-structured encoder based on bidirectional Tree-LSTM that propagates information first bottom-up then top-down, which allows each token to access information from the entire tree; and (2) a linguistically motivated head-first decoder that emphasizes the central role of the head and linearizes the subtree by incrementally attaching the dependents on both sides of the head. With the new encoder and decoder, we reach state-of-the-art performance on the Surface Realization Shared Task 2018 dataset, outperforming not only the shared tasks participants, but also previous state-of-the-art systems (Bohnet et al., 2011; Puduppully et al., 2016). Furthermore, we analyze the power of the tree-structured encoder with a probing task and show that it is able to recognize the topological relation between any pair of tokens in a tree.

2018

pdf
Moving TIGER beyond Sentence-Level
Agnieszka Falenska | Kerstin Eckart | Jonas Kuhn
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf abs
A General-Purpose Tagger with Convolutional Neural Networks
Xiang Yu | Agnieszka Falenska | Ngoc Thang Vu
Proceedings of the First Workshop on Subword and Character Level Models in NLP

We present a general-purpose tagger based on convolutional neural networks (CNN), used for both composing word vectors and encoding context information. The CNN tagger is robust across different tagging tasks: without task-specific tuning of hyper-parameters, it achieves state-of-the-art results in part-of-speech tagging, morphological tagging and supertagging. The CNN tagger is also robust against the out-of-vocabulary problem; it performs well on artificially unnormalized texts.

pdf abs
Lexicalized vs. Delexicalized Parsing in Low-Resource Scenarios
Agnieszka Falenska | Özlem Çetinoğlu
Proceedings of the 15th International Conference on Parsing Technologies

We present a systematic analysis of lexicalized vs. delexicalized parsing in low-resource scenarios, and propose a methodology to choose one method over another under certain conditions. We create a set of simulation experiments on 41 languages and apply our findings to 9 low-resource languages. Experimental results show that our methodology chooses the best approach in 8 out of 9 cases.

pdf abs
IMS at the CoNLL 2017 UD Shared Task: CRFs and Perceptrons Meet Neural Networks
Anders Björkelund | Agnieszka Falenska | Xiang Yu | Jonas Kuhn
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

This paper presents the IMS contribution to the CoNLL 2017 Shared Task. In the preprocessing step we employed a CRF POS/morphological tagger and a neural tagger predicting supertags. On some languages, we also applied word segmentation with the CRF tagger and sentence segmentation with a perceptron-based parser. For parsing we took an ensemble approach by blending multiple instances of three parsers with very different architectures. Our system achieved the third place overall and the second place for the surprise languages.