Elnaz Davoodi


Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization
Aishwarya Agrawal | Ivana Kajic | Emanuele Bugliarello | Elnaz Davoodi | Anita Gergely | Phil Blunsom | Aida Nematzadeh
Findings of the Association for Computational Linguistics: EACL 2023

Vision-and-language (V&L) models pretrained on large-scale multimodal data have demonstrated strong performance on various tasks such as image captioning and visual question answering (VQA). The quality of such models is commonly assessed by measuring their performance on unseen data that typically comes from the same distribution as the training data. However, when evaluated under out-of-distribution (out-of-dataset) settings for VQA, we observe that these models exhibit poor generalization. We comprehensively evaluate two pretrained V&L models under different settings (i.e. classification and open-ended text generation) by conducting cross-dataset evaluations. We find that these models tend to learn to solve the benchmark, rather than learning the high-level skills required by the VQA task. We also find that in most cases generative models are less susceptible to shifts in data distribution compared to discriminative ones, and that multimodal pretraining is generally helpful for OOD generalization. Finally, we revisit assumptions underlying the use of automatic VQA evaluation metrics, and empirically show that their stringent nature repeatedly penalizes models for correct responses.


The E2E NLG Challenge: A Tale of Two Systems
Charese Smiley | Elnaz Davoodi | Dezhao Song | Frank Schilder
Proceedings of the 11th International Conference on Natural Language Generation

This paper presents the two systems we entered into the 2017 E2E NLG Challenge: TemplGen, a templated-based system and SeqGen, a neural network-based system. Through the automatic evaluation, SeqGen achieved competitive results compared to the template-based approach and to other participating systems as well. In addition to the automatic evaluation, in this paper we present and discuss the human evaluation results of our two systems.


Automatic Identification of AltLexes using Monolingual Parallel Corpora
Elnaz Davoodi | Leila Kosseim
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

The automatic identification of discourse relations is still a challenging task in natural language processing. Discourse connectives, such as since or but, are the most informative cues to identify explicit relations; however discourse parsers typically use a closed inventory of such connectives. As a result, discourse relations signalled by markers outside these inventories (i.e. AltLexes) are not detected as effectively. In this paper, we propose a novel method to leverage parallel corpora in text simplification and lexical resources to automatically identify alternative lexicalizations that signal discourse relation. When applied to the Simple Wikipedia and Newsela corpora along with WordNet and the PPDB, the method allowed the automatic discovery of 91 AltLexes.


On the Contribution of Discourse Structure on Text Complexity Assessment
Elnaz Davoodi | Leila Kosseim
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

CLaC at SemEval-2016 Task 11: Exploring linguistic and psycho-linguistic Features for Complex Word Identification
Elnaz Davoodi | Leila Kosseim
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)


The CLaC Discourse Parser at CoNLL-2015
Majid Laali | Elnaz Davoodi | Leila Kosseim
Proceedings of the Nineteenth Conference on Computational Natural Language Learning - Shared Task