John Kelleher

Also published as: John D. Kelleher

2021

pdf bib abs
Finding BERT’s Idiomatic Key
Vasudevan Nedumpozhimana | John Kelleher
Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)

Sentence embeddings encode information relating to the usage of idioms in a sentence. This paper reports a set of experiments that combine a probing methodology with input masking to analyse where in a sentence this idiomatic information is taken from, and what form it takes. Our results indicate that BERT’s idiomatic key is primarily found within an idiomatic expression, but also draws on information from the surrounding context. Also, BERT can distinguish between the disruption in a sentence caused by words missing and the incongruity caused by idiomatic usage.

pdf bib abs
Poisoning Knowledge Graph Embeddings via Relation Inference Patterns
Peru Bhardwaj | John Kelleher | Luca Costabello | Declan O’Sullivan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We study the problem of generating data poisoning attacks against Knowledge Graph Embedding (KGE) models for the task of link prediction in knowledge graphs. To poison KGE models, we propose to exploit their inductive abilities which are captured through the relationship patterns like symmetry, inversion and composition in the knowledge graph. Specifically, to degrade the model’s prediction confidence on target facts, we propose to improve the model’s prediction confidence on a set of decoy facts. Thus, we craft adversarial additions that can improve the model’s prediction confidence on decoy facts through different inference patterns. Our experiments demonstrate that the proposed poisoning attacks outperform state-of-art baselines on four KGE models for two publicly available datasets. We also find that the symmetry pattern based attacks generalize across all model-dataset combinations which indicates the sensitivity of KGE models to this pattern.

pdf bib abs
Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods
Peru Bhardwaj | John Kelleher | Luca Costabello | Declan O’Sullivan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Despite the widespread use of Knowledge Graph Embeddings (KGE), little is known about the security vulnerabilities that might disrupt their intended behaviour. We study data poisoning attacks against KGE models for link prediction. These attacks craft adversarial additions or deletions at training time to cause model failure at test time. To select adversarial deletions, we propose to use the model-agnostic instance attribution methods from Interpretable Machine Learning, which identify the training instances that are most influential to a neural model’s predictions on test instances. We use these influential triples as adversarial deletions. We further propose a heuristic method to replace one of the two entities in each influential triple to generate adversarial additions. Our experiments show that the proposed strategies outperform the state-of-art data poisoning attacks on KGE models and improve the MRR degradation due to the attacks by up to 62% over the baselines.

2020

pdf bib abs
Energy-based Neural Modelling for Large-Scale Multiple Domain Dialogue State Tracking
Anh Duong Trinh | Robert J. Ross | John D. Kelleher
Proceedings of the Fourth Workshop on Structured Prediction for NLP

Scaling up dialogue state tracking to multiple domains is challenging due to the growth in the number of variables being tracked. Furthermore, dialog state tracking models do not yet explicitly make use of relationships between dialogue variables, such as slots across domains. We propose using energy-based structure prediction methods for large-scale dialogue state tracking task in two multiple domain dialogue datasets. Our results indicate that: (i) modelling variable dependencies yields better results; and (ii) the structured prediction output aligns with the dialogue slot-value constraint principles. This leads to promising directions to improve state-of-the-art models by incorporating variable dependencies into their prediction process.

pdf bib
Proceedings of the 13th International Conference on Natural Language Generation
Brian Davis | Yvette Graham | John Kelleher | Yaji Sripada
Proceedings of the 13th International Conference on Natural Language Generation

pdf bib abs
Language-Driven Region Pointer Advancement for Controllable Image Captioning
Annika Lindh | Robert Ross | John Kelleher
Proceedings of the 28th International Conference on Computational Linguistics

Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated natural language caption. This puts a stronger focus on producing more detailed descriptions, and opens the door for more end-user control over results. A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer. In this paper, we propose a novel method for predicting the timing of region pointer advancement by treating the advancement step as a natural part of the language structure via a NEXT-token, motivated by a strong correlation to the sentence structure in the training data. We find that our timing agrees with the ground-truth timing in the Flickr30k Entities test data with a precision of 86.55% and a recall of 97.92%. Our model implementing this technique improves the state-of-the-art on standard captioning metrics while additionally demonstrating a considerably larger effective vocabulary size.

pdf bib abs
Style versus Content: A distinction without a (learnable) difference?
Somayeh Jafaritazehjani | Gwénolé Lecorvé | Damien Lolive | John Kelleher
Proceedings of the 28th International Conference on Computational Linguistics

Textual style transfer involves modifying the style of a text while preserving its content. This assumes that it is possible to separate style from content. This paper investigates whether this separation is possible. We use sentiment transfer as our case study for style transfer analysis. Our experimental methodology frames style transfer as a multi-objective problem, balancing style shift with content preservation and fluency. Due to the lack of parallel data for style transfer we employ a variety of adversarial encoder-decoder networks in our experiments. Also, we use of a probing methodology to analyse how these models encode style-related features in their latent spaces. The results of our experiments which are further confirmed by a human evaluation reveal the inherent trade-off between the multiple style transfer objectives which indicates that style cannot be usefully separated from content within these style-transfer systems.

pdf bib abs
English WordNet Random Walk Pseudo-Corpora
Filip Klubička | Alfredo Maldonado | Abhijit Mahalunkar | John Kelleher
Proceedings of the 12th Language Resources and Evaluation Conference

This is a resource description paper that describes the creation and properties of a set of pseudo-corpora generated artificially from a random walk over the English WordNet taxonomy. Our WordNet taxonomic random walk implementation allows the exploration of different random walk hyperparameters and the generation of a variety of different pseudo-corpora. We find that different combinations of parameters result in varying statistical properties of the generated pseudo-corpora. We have published a total of 81 pseudo-corpora that we have used in our previous research, but have not exhausted all possible combinations of hyperparameters, which is why we have also published a codebase that allows the generation of additional WordNet taxonomic pseudo-corpora as needed. Ultimately, such pseudo-corpora can be used to train taxonomic word embeddings, as a way of transferring taxonomic knowledge into a word embedding space.

2019

pdf bib abs
Synthetic, yet natural: Properties of WordNet random walk corpora and the impact of rare words on embedding performance
Filip Klubička | Alfredo Maldonado | Abhijit Mahalunkar | John Kelleher
Proceedings of the 10th Global Wordnet Conference

Creating word embeddings that reflect semantic relationships encoded in lexical knowledge resources is an open challenge. One approach is to use a random walk over a knowledge graph to generate a pseudo-corpus and use this corpus to train embeddings. However, the effect of the shape of the knowledge graph on the generated pseudo-corpora, and on the resulting word embeddings, has not been studied. To explore this, we use English WordNet, constrained to the taxonomic (tree-like) portion of the graph, as a case study. We investigate the properties of the generated pseudo-corpora, and their impact on the resulting embeddings. We find that the distributions in the psuedo-corpora exhibit properties found in natural corpora, such as Zipf’s and Heaps’ law, and also observe that the proportion of rare words in a pseudo-corpus affects the performance of its embeddings on word similarity.

pdf bib abs
Multi-Element Long Distance Dependencies: Using SPk Languages to Explore the Characteristics of Long-Distance Dependencies
Abhijit Mahalunkar | John Kelleher
Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges

In order to successfully model Long Distance Dependencies (LDDs) it is necessary to under-stand the full-range of the characteristics of the LDDs exhibited in a target dataset. In this paper, we use Strictly k-Piecewise languages to generate datasets with various properties. We then compute the characteristics of the LDDs in these datasets using mutual information and analyze the impact of factors such as (i) k, (ii) length of LDDs, (iii) vocabulary size, (iv) forbidden strings, and (v) dataset size. This analysis reveal that the number of interacting elements in a dependency is an important characteristic of LDDs. This leads us to the challenge of modelling multi-element long-distance dependencies. Our results suggest that attention mechanisms in neural networks may aide in modeling datasets with multi-element long-distance dependencies. However, we conclude that there is a need to develop more efficient attention mechanisms to address this issue.

pdf bib abs
Energy-Based Modelling for Dialogue State Tracking
Anh Duong Trinh | Robert Ross | John Kelleher
Proceedings of the First Workshop on NLP for Conversational AI

The uncertainties of language and the complexity of dialogue contexts make accurate dialogue state tracking one of the more challenging aspects of dialogue processing. To improve state tracking quality, we argue that relationships between different aspects of dialogue state must be taken into account as they can often guide a more accurate interpretation process. To this end, we present an energy-based approach to dialogue state tracking as a structured classification task. The novelty of our approach lies in the use of an energy network on top of a deep learning architecture to explore more signal correlations between network variables including input features and output labels. We demonstrate that the energy-based approach improves the performance of a deep learning dialogue state tracker towards state-of-the-art results without the need for many of the other steps required by current state-of-the-art methods.

pdf bib abs
Capturing Dialogue State Variable Dependencies with an Energy-based Neural Dialogue State Tracker
Anh Duong Trinh | Robert J. Ross | John D. Kelleher
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Dialogue state tracking requires the population and maintenance of a multi-slot frame representation of the dialogue state. Frequently, dialogue state tracking systems assume independence between slot values within a frame. In this paper we argue that treating the prediction of each slot value as an independent prediction task may ignore important associations between the slot values, and, consequently, we argue that treating dialogue state tracking as a structured prediction problem can help to improve dialogue state tracking performance. To support this argument, the research presented in this paper is structured into three stages: (i) analyzing variable dependencies in dialogue data; (ii) applying an energy-based methodology to model dialogue state tracking as a structured prediction task; and (iii) evaluating the impact of inter-slot relationships on model performance. Overall we demonstrate that modelling the associations between target slots with an energy-based formalism improves dialogue state tracking performance in a number of ways.

pdf bib abs
Persistence pays off: Paying Attention to What the LSTM Gating Mechanism Persists
Giancarlo Salton | John Kelleher
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Recurrent Neural Network Language Models composed of LSTM units, especially those augmented with an external memory, have achieved state-of-the-art results in Language Modeling. However, these models still struggle to process long sequences which are more likely to contain long-distance dependencies because of information fading. In this paper we demonstrate an effective mechanism for retrieving information in a memory augmented LSTM LM based on attending to information in memory in proportion to the number of timesteps the LSTM gating mechanism persisted the information.

pdf bib abs
Bigger versus Similar: Selecting a Background Corpus for First Story Detection Based on Distributional Similarity
Fei Wang | Robert J. Ross | John D. Kelleher
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

The current state of the art for First Story Detection (FSD) are nearest neighbour-based models with traditional term vector representations; however, one challenge faced by FSD models is that the document representation is usually defined by the vocabulary and term frequency from a background corpus. Consequently, the ideal background corpus should arguably be both large-scale to ensure adequate term coverage, and similar to the target domain in terms of the language distribution. However, given these two factors cannot always be mutually satisfied, in this paper we examine whether the distributional similarity of common terms is more important than the scale of common terms for FSD. As a basis for our analysis we propose a set of metrics to quantitatively measure the scale of common terms and the distributional similarity between corpora. Using these metrics we rank different background corpora relative to a target corpus. We also apply models based on different background corpora to the FSD task. Our results show that term distributional similarity is more predictive of good FSD performance than the scale of common terms; and, thus we demonstrate that a smaller recent domain-related corpus will be more suitable than a very large-scale general corpus for FSD.

2018

pdf bib abs
Exploring the Functional and Geometric Bias of Spatial Relations Using Neural Language Models
Simon Dobnik | Mehdi Ghanimifard | John Kelleher
Proceedings of the First International Workshop on Spatial Language Understanding

The challenge for computational models of spatial descriptions for situated dialogue systems is the integration of information from different modalities. The semantics of spatial descriptions are grounded in at least two sources of information: (i) a geometric representation of space and (ii) the functional interaction of related objects that. We train several neural language models on descriptions of scenes from a dataset of image captions and examine whether the functional or geometric bias of spatial descriptions reported in the literature is reflected in the estimated perplexity of these models. The results of these experiments have implications for the creation of models of spatial lexical semantics for human-robot dialogue systems. Furthermore, they also provide an insight into the kinds of the semantic knowledge captured by neural language models trained on spatial descriptions, which has implications for image captioning systems.

pdf bib
Is it worth it? Budget-related evaluation metrics for model selection
Filip Klubička | Giancarlo D. Salton | John D. Kelleher
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs
Attentive Language Models
Giancarlo Salton | Robert Ross | John Kelleher
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this paper, we extend Recurrent Neural Network Language Models (RNN-LMs) with an attention mechanism. We show that an “attentive” RNN-LM (with 11M parameters) achieves a better perplexity than larger RNN-LMs (with 66M parameters) and achieves performance comparable to an ensemble of 10 similar sized RNN-LMs. We also show that an “attentive” RNN-LM needs less contextual information to achieve similar results to the state-of-the-art on the wikitext2 dataset.

pdf bib abs
Idiom Type Identification with Smoothed Lexical Features and a Maximum Margin Classifier
Giancarlo Salton | Robert Ross | John Kelleher
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In our work we address limitations in the state-of-the-art in idiom type identification. We investigate different approaches for a lexical fixedness metric, a component of the state-of the-art model. We also show that our Machine Learning based approach to the idiom type identification task achieves an F1-score of 0.85, an improvement of 11 points over the state-of the-art.

John Kelleher

2021

2020

2019

2018

2017

2016

2014

2013

2010

2009

2008

2007

2006

2005

Co-authors

Venues