Simon Dobnik

2023

pdf bib
Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)
Nikolai Ilinykh | Felix Morger | Dana Dannélls | Simon Dobnik | Beáta Megyesi | Joakim Nivre
Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)

2022

pdf
Look and Answer the Question: On the Role of Vision in Embodied Question Answering
Nikolai Ilinykh | Yasmeen Emampoor | Simon Dobnik
Proceedings of the 15th International Conference on Natural Language Generation

pdf abs
Attention as Grounding: Exploring Textual and Cross-Modal Attention on Entities and Relations in Language-and-Vision Transformer
Nikolai Ilinykh | Simon Dobnik
Findings of the Association for Computational Linguistics: ACL 2022

We explore how a multi-modal transformer trained for generation of longer image descriptions learns syntactic and semantic representations about entities and relations grounded in objects at the level of masked self-attention (text generation) and cross-modal attention (information fusion). We observe that cross-attention learns the visual grounding of noun phrases into objects and high-level semantic information about spatial relations, while text-to-text attention captures low-level syntactic knowledge between words. This concludes that language models in a multi-modal task learn different semantic information about objects and relations cross-modally and uni-modally (text-only). Our code is available here: https://github.com/GU-CLASP/attention-as-grounding.

pdf bib
Proceedings of the 2022 CLASP Conference on (Dis)embodiment
Simon Dobnik | Julian Grove | Asad Sayeed
Proceedings of the 2022 CLASP Conference on (Dis)embodiment

In this paper we examine different meaning representations that are commonly used in different natural language applications today and discuss their limits, both in terms of the aspects of the natural language meaning they are modelling and in terms of the aspects of the application for which they are used.

pdf abs
Examining the Effects of Language-and-Vision Data Augmentation for Generation of Descriptions of Human Faces
Nikolai Ilinykh | Rafal Černiavski | Eva Elžbieta Sventickaitė | Viktorija Buzaitė | Simon Dobnik
Proceedings of the 2nd Workshop on People in Vision, Language, and the Mind

We investigate how different augmentation techniques on both textual and visual representations affect the performance of the face description generation model. Specifically, we provide the model with either original images, sketches of faces, facial composites or distorted images. In addition, on the language side, we experiment with different methods to augment the original dataset with paraphrased captions, which are semantically equivalent to the original ones, but differ in terms of their form. We also examine if augmenting the dataset with descriptions from a different domain (e.g., image captions of real-world images) has an effect on the performance of the models. We train models on different combinations of visual and linguistic features and perform both (i) automatic evaluation of generated captions and (ii) examination of how useful different visual features are for the task of facial feature classification. Our results show that although original images encode the best possible representation for the task, the model trained on sketches can still perform relatively well. We also observe that augmenting the dataset with descriptions from a different domain can boost performance of the model. We conclude that face description generation systems are more susceptible to language rather than vision data augmentation. Overall, we demonstrate that face caption generation models display a strong imbalance in the utilisation of language and vision modalities, indicating a lack of proper information fusion. We also describe ethical implications of our study and argue that future work on human face description generation should create better, more representative datasets.

pdf abs
Anaphoric Phenomena in Situated dialog: A First Round of Annotations
Sharid Loáiciga | Simon Dobnik | David Schlangen
Proceedings of the Fifth Workshop on Computational Models of Reference, Anaphora and Coreference

We present a first release of 500 documents from the multimodal corpus Tell-me-more (Ilinykh et al., 2019) annotated with coreference information according to the ARRAU guidelines (Poesio et al., 2021). The corpus consists of images and short texts of five sentences. We describe the annotation process and present the adaptations to the original guidelines in order to account for the challenges of grounding the annotations to the image. 50 documents from the 500 available are annotated by two people and used to estimate inter-annotator agreement (IAA) relying on Krippendorff’s alpha.

pdf abs
Pre-trained Models or Feature Engineering: The Case of Dialectal Arabic
Kathrein Abu Kwaik | Stergios Chatzikyriakidis | Simon Dobnik
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection

The usage of social media platforms has resulted in the proliferation of work on Arabic Natural Language Processing (ANLP), including the development of resources. There is also an increased interest in processing Arabic dialects and a number of models and algorithms have been utilised for the purpose of Dialectal Arabic Natural Language Processing (DANLP). In this paper, we conduct a comparison study between some of the most well-known and most commonly used methods in NLP in order to test their performance on different corpora and two NLP tasks: Dialect Identification and Sentiment Analysis. In particular, we compare three general classes of models: a) traditional Machine Learning models with features, b) classic Deep Learning architectures (LSTMs) with pre-trained word embeddings and lastly c) different Bidirectional Encoder Representations from Transformers (BERT) models such as (Multilingual-BERT, Ara-BERT, and Twitter-Arabic-BERT). The results of the comparison show that using feature-based classification can still compete with BERT models in these dialectal Arabic contexts. The use of transformer models have the ability to outperform traditional Machine Learning approaches, depending on the type of text they have been trained on, in contrast to classic Deep Learning models like LSTMs which do not perform well on the tasks

pdf abs
Do Decoding Algorithms Capture Discourse Structure in Multi-Modal Tasks? A Case Study of Image Paragraph Generation
Nikolai Ilinykh | Simon Dobnik
Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

This paper describes insights into how different inference algorithms structure discourse in image paragraphs. We train a multi-modal transformer and compare 11 variations of decoding algorithms. We propose to evaluate image paragraphs not only with standard automatic metrics, but also with a more extensive, “under the hood” analysis of the discourse formed by sentences. Our results show that while decoding algorithms can be unfaithful to the reference texts, they still generate grounded descriptions, but they also lack understanding of the discourse structure and differ from humans in terms of attentional structure over images.

In this paper, we present a number of fine-grained resources for Natural Language Inference (NLI). In particular, we present a number of resources and validation methods for Greek NLI and a resource for precise NLI. First, we extend the Greek version of the FraCaS test suite to include examples where the inference is directly linked to the syntactic/morphological properties of Greek. The new resource contains an additional 428 examples, making it in total a dataset of 774 examples. Expert annotators have been used in order to create the additional resource, while extensive validation of the original Greek version of the FraCaS by non-expert and expert subjects is performed. Next, we continue the work initiated by (CITATION), according to which a subset of the RTE problems have been labeled for missing hypotheses and we present a dataset an order of magnitude larger, annotating the whole SuperGlUE/RTE dataset with missing hypotheses. Lastly, we provide a de-dropped version of the Greek XNLI dataset, where the pronouns that are missing due to the pro-drop nature of the language are inserted. We then run some models to see the effect of that insertion and report the results.

2021

pdf bib
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)
Christine Howes | Simon Dobnik | Ellen Breitholtz | Stergios Chatzikyriakidis
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)

pdf abs
Reference and coreference in situated dialogue
Sharid Loáiciga | Simon Dobnik | David Schlangen
Proceedings of the Second Workshop on Advances in Language and Vision Research

In recent years several corpora have been developed for vision and language tasks. We argue that there is still significant room for corpora that increase the complexity of both visual and linguistic domains and which capture different varieties of perceptual and conversational contexts. Working with two corpora approaching this goal, we present a linguistic perspective on some of the challenges in creating and extending resources combining language and vision while preserving continuity with the existing best practices in the area of coreference annotation.

pdf bib
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Simon Dobnik | Lilja Øvrelid
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

pdf abs
How Vision Affects Language: Comparing Masked Self-Attention in Uni-Modal and Multi-Modal Transformer
Nikolai Ilinykh | Simon Dobnik
Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR)

The problem of interpretation of knowledge learned by multi-head self-attention in transformers has been one of the central questions in NLP. However, a lot of work mainly focused on models trained for uni-modal tasks, e.g. machine translation. In this paper, we examine masked self-attention in a multi-modal transformer trained for the task of image captioning. In particular, we test whether the multi-modality of the task objective affects the learned attention patterns. Our visualisations of masked self-attention demonstrate that (i) it can learn general linguistic knowledge of the textual input, and (ii) its attention patterns incorporate artefacts from visual modality even though it has never accessed it directly. We compare our transformer’s attention patterns with masked attention in distilgpt-2 tested for uni-modal text generation of image captions. Based on the maps of extracted attention weights, we argue that masked self-attention in image captioning transformer seems to be enhanced with semantic knowledge from images, exemplifying joint language-and-vision information in its attention patterns.

pdf abs
Annotating anaphoric phenomena in situated dialogue
Sharid Loáiciga | Simon Dobnik | David Schlangen
Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR)

In recent years several corpora have been developed for vision and language tasks. With this paper, we intend to start a discussion on the annotation of referential phenomena in situated dialogue. We argue that there is still significant room for corpora that increase the complexity of both visual and linguistic domains and which capture different varieties of perceptual and conversational contexts. In addition, a rich annotation scheme covering a broad range of referential phenomena and compatible with the textual task of coreference resolution is necessary in order to take the most advantage of these corpora. Consequently, there are several open questions regarding the semantics of reference and annotation, and the extent to which standard textual coreference accounts for the situated dialogue genre. Working with two corpora on situated dialogue, we present our extension to the ARRAU (Uryupina et al., 2020) annotation scheme in order to start this discussion.

2020

pdf bib abs
An Arabic Tweets Sentiment Analysis Dataset (ATSAD) using Distant Supervision and Self Training
Kathrein Abu Kwaik | Stergios Chatzikyriakidis | Simon Dobnik | Motaz Saad | Richard Johansson
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection

As the number of social media users increases, they express their thoughts, needs, socialise and publish their opinions reviews. For good social media sentiment analysis, good quality resources are needed, and the lack of these resources is particularly evident for languages other than English, in particular Arabic. The available Arabic resources lack of from either the size of the corpus or the quality of the annotation. In this paper, we present an Arabic Sentiment Analysis Corpus collected from Twitter, which contains 36K tweets labelled into positive and negative. We employed distant supervision and self-training approaches into the corpus to annotate it. Besides, we release an 8K tweets manually annotated as a gold standard. We evaluated the corpus intrinsically by comparing it to human classification and pre-trained sentiment analysis models, Moreover, we apply extrinsic evaluation methods exploiting sentiment analysis task and achieve an accuracy of 86%.

pdf abs
Fast visual grounding in interaction: bringing few-shot learning with neural networks to an interactive robot
José Miguel Cano Santín | Simon Dobnik | Mehdi Ghanimifard
Proceedings of the Probability and Meaning Conference (PaM 2020)

The major shortcomings of using neural networks with situated agents are that in incremental interaction very few learning examples are available and that their visual sensory representations are quite different from image caption datasets. In this work we adapt and evaluate a few-shot learning approach, Matching Networks (Vinyals et al., 2016), to conversational strategies of a robot interacting with a human tutor in order to efficiently learn to categorise objects that are presented to it and also investigate to what degree transfer learning from pre-trained models on images from different contexts can improve its performance. We discuss the implications of such learning on the nature of semantic representations the system has learned.

pdf abs
Sky + Fire = Sunset. Exploring Parallels between Visually Grounded Metaphors and Image Classifiers
Yuri Bizzoni | Simon Dobnik
Proceedings of the Second Workshop on Figurative Language Processing

This work explores the differences and similarities between neural image classifiers’ mis-categorisations and visually grounded metaphors - that we could conceive as intentional mis-categorisations. We discuss the possibility of using automatic image classifiers to approximate human metaphoric behaviours, and the limitations of such frame. We report two pilot experiments to study grounded metaphoricity. In the first we represent metaphors as a form of visual mis-categorisation. In the second we model metaphors as a more flexible, compositional operation in a continuous visual space generated from automatic classification systems.

pdf abs
When an Image Tells a Story: The Role of Visual and Semantic Information for Generating Paragraph Descriptions
Nikolai Ilinykh | Simon Dobnik
Proceedings of the 13th International Conference on Natural Language Generation

Generating multi-sentence image descriptions is a challenging task, which requires a good model to produce coherent and accurate paragraphs, describing salient objects in the image. We argue that multiple sources of information are beneficial when describing visual scenes with long sequences. These include (i) perceptual information and (ii) semantic (language) information about how to describe what is in the image. We also compare the effects of using two different pooling mechanisms on either a single modality or their combination. We demonstrate that the model which utilises both visual and language inputs can be used to generate accurate and diverse paragraphs when combined with a particular pooling mechanism. The results of our automatic and human evaluation show that learning to embed semantic information along with visual stimuli into the paragraph generation model is not trivial, raising a variety of proposals for future experiments.

2019

pdf abs
Normalising Non-standardised Orthography in Algerian Code-switched User-generated Data
Wafia Adouane | Jean-Philippe Bernardy | Simon Dobnik
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

We work with Algerian, an under-resourced non-standardised Arabic variety, for which we compile a new parallel corpus consisting of user-generated textual data matched with normalised and corrected human annotations following data-driven and our linguistically motivated standard. We use an end-to-end deep neural model designed to deal with context-dependent spelling correction and normalisation. Results indicate that a model with two CNN sub-network encoders and an LSTM decoder performs the best, and that word context matters. Additionally, pre-processing data token-by-token with an edit-distance based aligner significantly improves the performance. We get promising results for the spelling correction and normalisation, as a pre-processing step for downstream tasks, on detecting binary Semantic Textual Similarity.

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Long Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Short Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg
Proceedings of the 13th International Conference on Computational Semantics - Short Papers

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Student Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg | Kathrein Abu Kwaik | Vladislav Maraev
Proceedings of the 13th International Conference on Computational Semantics - Student Papers

pdf
ImageTTR: Grounding Type Theory with Records in Image Classification for Visual Question Answering
Arild Matsson | Simon Dobnik | Staffan Larsson
Proceedings of the IWCS 2019 Workshop on Computing Semantics with Types, Frames and Related Structures

pdf abs
What a neural language model tells us about spatial relations
Mehdi Ghanimifard | Simon Dobnik
Proceedings of the Combined Workshop on Spatial Language Understanding (SpLU) and Grounded Communication for Robotics (RoboNLP)

Understanding and generating spatial descriptions requires knowledge about what objects are related, their functional interactions, and where the objects are geometrically located. Different spatial relations have different functional and geometric bias. The wide usage of neural language models in different areas including generation of image description motivates the study of what kind of knowledge is encoded in neural language models about individual spatial relations. With the premise that the functional bias of relations is expressed in their word distributions, we construct multi-word distributional vector representations and show that these representations perform well on intrinsic semantic reasoning tasks, thus confirming our premise. A comparison of our vector representations to human semantic judgments indicates that different bias (functional or geometric) is captured in different data collection tasks which suggests that the contribution of the two meaning modalities is dynamic, related to the context of the task.

pdf abs
Neural Models for Detecting Binary Semantic Textual Similarity for Algerian and MSA
Wafia Adouane | Jean-Philippe Bernardy | Simon Dobnik
Proceedings of the Fourth Arabic Natural Language Processing Workshop

We explore the extent to which neural networks can learn to identify semantically equivalent sentences from a small variable dataset using an end-to-end training. We collect a new noisy non-standardised user-generated Algerian (ALG) dataset and also translate it to Modern Standard Arabic (MSA) which serves as its regularised counterpart. We compare the performance of various models on both datasets and report the best performing configurations. The results show that relatively simple models composed of 2 LSTM layers outperform by far other more sophisticated attention-based architectures, for both ALG and MSA datasets.

pdf
Can Modern Standard Arabic Approaches be used for Arabic Dialects? Sentiment Analysis as a Case Study
Kathrein Abu Kwaik | Stergios Chatzikyriakidis | Simon Dobnik
Proceedings of the 3rd Workshop on Arabic Corpus Linguistics

pdf abs
What goes into a word: generating image descriptions with top-down spatial knowledge
Mehdi Ghanimifard | Simon Dobnik
Proceedings of the 12th International Conference on Natural Language Generation

Generating grounded image descriptions requires associating linguistic units with their corresponding visual clues. A common method is to train a decoder language model with attention mechanism over convolutional visual features. Attention weights align the stratified visual features arranged by their location with tokens, most commonly words, in the target description. However, words such as spatial relations (e.g. next to and under) are not directly referring to geometric arrangements of pixels but to complex geometric and conceptual representations. The aim of this paper is to evaluate what representations facilitate generating image descriptions with spatial relations and lead to better grounded language generation. In particular, we investigate the contribution of three different representational modalities in generating relational referring expressions: (i) pre-trained convolutional visual features, (ii) different top-down geometric relational knowledge between objects, and (iii) world knowledge captured by contextual embeddings in language models.

2018

pdf abs
A Comparison of Character Neural Language Model and Bootstrapping for Language Identification in Multilingual Noisy Texts
Wafia Adouane | Simon Dobnik | Jean-Philippe Bernardy | Nasredine Semmar
Proceedings of the Second Workshop on Subword/Character LEvel Models

This paper seeks to examine the effect of including background knowledge in the form of character pre-trained neural language model (LM), and data bootstrapping to overcome the problem of unbalanced limited resources. As a test, we explore the task of language identification in mixed-language short non-edited texts with an under-resourced language, namely the case of Algerian Arabic for which both labelled and unlabelled data are limited. We compare the performance of two traditional machine learning methods and a deep neural networks (DNNs) model. The results show that overall DNNs perform better on labelled data for the majority categories and struggle with the minority ones. While the effect of the untokenised and unlabelled data encoded as LM differs for each category, bootstrapping, however, improves the performance of all systems and all categories. These methods are language independent and could be generalised to other under-resourced languages for which a small labelled data and a larger unlabelled data are available.

pdf bib abs
Exploring the Functional and Geometric Bias of Spatial Relations Using Neural Language Models
Simon Dobnik | Mehdi Ghanimifard | John Kelleher
Proceedings of the First International Workshop on Spatial Language Understanding

The challenge for computational models of spatial descriptions for situated dialogue systems is the integration of information from different modalities. The semantics of spatial descriptions are grounded in at least two sources of information: (i) a geometric representation of space and (ii) the functional interaction of related objects that. We train several neural language models on descriptions of scenes from a dataset of image captions and examine whether the functional or geometric bias of spatial descriptions reported in the literature is reflected in the estimated perplexity of these models. The results of these experiments have implications for the creation of models of spatial lexical semantics for human-robot dialogue systems. Furthermore, they also provide an insight into the kinds of the semantic knowledge captured by neural language models trained on spatial descriptions, which has implications for image captioning systems.

pdf abs
Improving Neural Network Performance by Injecting Background Knowledge: Detecting Code-switching and Borrowing in Algerian texts
Wafia Adouane | Jean-Philippe Bernardy | Simon Dobnik
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching

We explore the effect of injecting background knowledge to different deep neural network (DNN) configurations in order to mitigate the problem of the scarcity of annotated data when applying these models on datasets of low-resourced languages. The background knowledge is encoded in the form of lexicons and pre-trained sub-word embeddings. The DNN models are evaluated on the task of detecting code-switching and borrowing points in non-standardised user-generated Algerian texts. Overall results show that DNNs benefit from adding background knowledge. However, the gain varies between models and categories. The proposed DNN architectures are generic and could be applied to other low-resourced languages.

pdf
Shami: A Corpus of Levantine Arabic Dialects
Kathrein Abu Kwaik | Motaz Saad | Stergios Chatzikyriakidis | Simon Dobnik
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf
KILLE: a Framework for Situated Agents for Learning Language Through Interaction
Simon Dobnik | Erik de Graaf
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib abs
Identification of Languages in Algerian Arabic Multilingual Documents
Wafia Adouane | Simon Dobnik
Proceedings of the Third Arabic Natural Language Processing Workshop

This paper presents a language identification system designed to detect the language of each word, in its context, in a multilingual documents as generated in social media by bilingual/multilingual communities, in our case speakers of Algerian Arabic. We frame the task as a sequence tagging problem and use supervised machine learning with standard methods like HMM and Ngram classification tagging. We also experiment with a lexicon-based method. Combining all the methods in a fall-back mechanism and introducing some linguistic rules, to deal with unseen tokens and ambiguous words, gives an overall accuracy of 93.14%. Finally, we introduced rules for language identification from sequences of recognised words.

pdf
Learning to Compose Spatial Relations with Grounded Neural Language Models
Mehdi Ghanimifard | Simon Dobnik
IWCS 2017 - 12th International Conference on Computational Semantics - Long papers

pdf
An overview of Natural Language Inference Data Collection: The way forward?
Stergios Chatzikyriakidis | Robin Cooper | Simon Dobnik | Staffan Larsson
Proceedings of the Computing Natural Language Inference Workshop

2015

pdf bib abs
Probabilistic Type Theory and Natural Language Semantics
Robin Cooper | Simon Dobnik | Shalom Lappin | Staffan Larsson
Linguistic Issues in Language Technology, Volume 10, 2015

Type theory has played an important role in specifying the formal connection between syntactic structure and semantic interpretation within the history of formal semantics. In recent years rich type theories developed for the semantics of programming languages have become influential in the semantics of natural language. The use of probabilistic reasoning to model human learning and cognition has become an increasingly important part of cognitive science. In this paper we offer a probabilistic formulation of a rich type theory, Type Theory with Records (TTR), and we illustrate how this framework can be used to approach the problem of semantic learning. Our probabilistic version of TTR is intended to provide an interface between the cognitive process of classifying situations according to the types that they instantiate, and the compositional semantics of natural language.

Simon Dobnik

2023

2022

2021

2020

2019

2018

2017

2015

2014

2013

Co-authors

Venues