Frame-semantic parsers traditionally predict predicates, frames, and semantic roles in a fixed order. This paper explores the ‘chicken-or-egg’ problem of interdependencies between these components theoretically and practically. We introduce a flexible BERT-based sequence labeling architecture that allows for predicting frames and roles independently from each other or combining them in several ways. Our results show that our setups can approximate more complex traditional models’ performance, while allowing for a clearer view of the interdependencies between the pipeline’s components, and of how frame and role prediction models make different use of BERT’s layers.
In this paper, we introduce the task of using FrameNet to link structured information about real-world events to the conceptual frames used in texts describing these events. We show that frames made relevant by the knowledge of the real-world event can be captured by complementing standard lexicon-driven FrameNet annotations with frame annotations derived through pragmatic inference. We propose a two-layered annotation scheme with a ‘strict’ FrameNet-compatible lexical layer and a ‘loose’ layer capturing frames that are inferred from referential data.
Translating to and from low-resource polysynthetic languages present numerous challenges for NMT. We present the results of our systems for the English–Inuktitut language pair for the WMT 2020 translation tasks. We investigated the importance of correct morphological segmentation, whether or not adding data from a related language (Greenlandic) helps, and whether using contextual word embeddings improves translation. While each method showed some promise, the results are mixed.
In this article, we lay out the basic ideas and principles of the project Framing Situations in the Dutch Language. We provide our first results of data acquisition, together with the first data release. We introduce the notion of cross-lingual referential corpora. These corpora consist of texts that make reference to exactly the same incidents. The referential grounding allows us to analyze the framing of these incidents in different languages and across different texts. During the project, we will use the automatically generated data to study linguistic framing as a phenomenon, build framing resources such as lexicons and corpora. We expect to capture larger variation in framing compared to traditional approaches for building such resources. Our first data release, which contains structured data about a large number of incidents and reference texts, can be found at http://dutchframenet.nl/data-releases/.
Recent work in cognitive neuroscience has introduced models for predicting distributional word meaning representations from brain imaging data. Such models have great potential, but the quality of their predictions has not yet been thoroughly evaluated from a computational linguistics point of view. Due to the limited size of available brain imaging datasets, standard quality metrics (e.g. similarity judgments and analogies) cannot be used. Instead, we investigate the use of several alternative measures for evaluating the predicted distributional space against a corpus-derived distributional space. We show that a state-of-the-art decoder, while performing impressively on metrics that are commonly used in cognitive neuroscience, performs unexpectedly poorly on our metrics. To address this, we propose strategies for improving the model’s performance. Despite returning promising results, our experiments also demonstrate that much work remains to be done before distributional representations can reliably be predicted from brain data.
The present study proposes an annotation scheme for classifying the content and discourse contribution of question-answer pairs. We propose detailed guidelines for using the scheme and apply them to dialogues in English, Spanish, and Dutch. Finally, we report on initial machine learning experiments for automatic annotation.