Oana-Maria Camburu


2022

pdf
Few-Shot Out-of-Domain Transfer Learning of Natural Language Explanations in a Label-Abundant Setup
Yordan Yordanov | Vid Kocijan | Thomas Lukasiewicz | Oana-Maria Camburu
Findings of the Association for Computational Linguistics: EMNLP 2022

Training a model to provide natural language explanations (NLEs) for its predictions usually requires the acquisition of task-specific NLEs, which is time- and resource-consuming. A potential solution is the few-shot out-of-domain transfer of NLEs from a parent task with many NLEs to a child task.In this work, we examine the setup in which the child task has few NLEs but abundant labels. We establish four few-shot transfer learning methods that cover the possible fine-tuning combinations of the labels and NLEs for the parent and child tasks. We transfer explainability from a large natural language inference dataset (e-SNLI) separately to two child tasks: (1) hard cases of pronoun resolution, where we introduce the small-e-WinoGrande dataset of NLEs on top of the WinoGrande dataset, and (2) commonsense validation (ComVE). Our results demonstrate that the parent task helps with NLE generation and we establish the best methods for this setup.

2021

pdf bib
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)
Anna Rogers | Iacer Calixto | Ivan Vulić | Naomi Saphra | Nora Kassner | Oana-Maria Camburu | Trapit Bansal | Vered Shwartz
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

2020

pdf
Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations
Oana-Maria Camburu | Brendan Shillingford | Pasquale Minervini | Thomas Lukasiewicz | Phil Blunsom
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

To increase trust in artificial intelligence systems, a promising research direction consists of designing neural models capable of generating natural language explanations for their predictions. In this work, we show that such models are nonetheless prone to generating mutually inconsistent explanations, such as ”Because there is a dog in the image.” and ”Because there is no dog in the [same] image.”, exposing flaws in either the decision-making process of the model or in the generation of the explanations. We introduce a simple yet effective adversarial framework for sanity checking models against the generation of inconsistent natural language explanations. Moreover, as part of the framework, we address the problem of adversarial attacks with full target sequences, a scenario that was not previously addressed in sequence-to-sequence attacks. Finally, we apply our framework on a state-of-the-art neural natural language inference model that provides natural language explanations for its predictions. Our framework shows that this model is capable of generating a significant number of inconsistent explanations.

pdf
Does the Objective Matter? Comparing Training Objectives for Pronoun Resolution
Yordan Yordanov | Oana-Maria Camburu | Vid Kocijan | Thomas Lukasiewicz
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Hard cases of pronoun resolution have been used as a long-standing benchmark for commonsense reasoning. In the recent literature, pre-trained language models have been used to obtain state-of-the-art results on pronoun resolution. Overall, four categories of training and evaluation objectives have been introduced. The variety of training datasets and pre-trained language models used in these works makes it unclear whether the choice of training objective is critical. In this work, we make a fair comparison of the performance and seed-wise stability of four models that represent the four categories of objectives. Our experiments show that the objective of sequence ranking performs the best in-domain, while the objective of semantic similarity between candidates and pronoun performs the best out-of-domain. We also observe a seed-wise instability of the model using sequence ranking, which is not the case when the other objectives are used.

2019

pdf
A Surprisingly Robust Trick for the Winograd Schema Challenge
Vid Kocijan | Ana-Maria Cretu | Oana-Maria Camburu | Yordan Yordanov | Thomas Lukasiewicz
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The Winograd Schema Challenge (WSC) dataset WSC273 and its inference counterpart WNLI are popular benchmarks for natural language understanding and commonsense reasoning. In this paper, we show that the performance of three language models on WSC273 consistently and robustly improves when fine-tuned on a similar pronoun disambiguation problem dataset (denoted WSCR). We additionally generate a large unsupervised WSC-like dataset. By fine-tuning the BERT language model both on the introduced and on the WSCR dataset, we achieve overall accuracies of 72.5% and 74.7% on WSC273 and WNLI, improving the previous state-of-the-art solutions by 8.8% and 9.6%, respectively. Furthermore, our fine-tuned models are also consistently more accurate on the “complex” subsets of WSC273, introduced by Trichelair et al. (2018).

pdf
WikiCREM: A Large Unsupervised Corpus for Coreference Resolution
Vid Kocijan | Oana-Maria Camburu | Ana-Maria Cretu | Yordan Yordanov | Phil Blunsom | Thomas Lukasiewicz
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Pronoun resolution is a major area of natural language understanding. However, large-scale training sets are still scarce, since manually labelling data is costly. In this work, we introduce WikiCREM (Wikipedia CoREferences Masked) a large-scale, yet accurate dataset of pronoun disambiguation instances. We use a language-model-based approach for pronoun resolution in combination with our WikiCREM dataset. We compare a series of models on a collection of diverse and challenging coreference resolution problems, where we match or outperform previous state-of-the-art approaches on 6 out of 7 datasets, such as GAP, DPR, WNLI, PDP, WinoBias, and WinoGender. We release our model to be used off-the-shelf for solving pronoun disambiguation.