Ahmed Elgohary


NL-EDIT: Correcting Semantic Parse Errors through Natural Language Interaction
Ahmed Elgohary | Christopher Meek | Matthew Richardson | Adam Fourney | Gonzalo Ramos | Ahmed Hassan Awadallah
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We study semantic parsing in an interactive setting in which users correct errors with natural language feedback. We present NL-EDIT, a model for interpreting natural language feedback in the interaction context to generate a sequence of edits that can be applied to the initial parse to correct its errors. We show that NL-EDIT can boost the accuracy of existing text-to-SQL parsers by up to 20% with only one turn of correction. We analyze the limitations of the model and discuss directions for improvement and evaluation. The code and datasets used in this paper are publicly available at http://aka.ms/NLEdit.


Speak to your Parser: Interactive Text-to-SQL with Natural Language Feedback
Ahmed Elgohary | Saghar Hosseini | Ahmed Hassan Awadallah
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We study the task of semantic parse correction with natural language feedback. Given a natural language utterance, most semantic parsing systems pose the problem as one-shot translation where the utterance is mapped to a corresponding logical form. In this paper, we investigate a more interactive scenario where humans can further interact with the system by providing free-form natural language feedback to correct the system when it generates an inaccurate interpretation of an initial utterance. We focus on natural language to SQL systems and construct, SPLASH, a dataset of utterances, incorrect SQL interpretations and the corresponding natural language feedback. We compare various reference models for the correction task and show that incorporating such a rich form of feedback can significantly improve the overall semantic parsing accuracy while retaining the flexibility of natural language interaction. While we estimated human correction accuracy is 81.5%, our best model achieves only 25.1%, which leaves a large gap for improvement in future research. SPLASH is publicly available at https://aka.ms/Splash_dataset.

It Takes Two to Lie: One to Lie, and One to Listen
Denis Peskov | Benny Cheng | Ahmed Elgohary | Joe Barrow | Cristian Danescu-Niculescu-Mizil | Jordan Boyd-Graber
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Trust is implicit in many online text conversations—striking up new friendships, or asking for tech support. But trust can be betrayed through deception. We study the language and dynamics of deception in the negotiation-based game Diplomacy, where seven players compete for world domination by forging and breaking alliances with each other. Our study with players from the Diplomacy community gathers 17,289 messages annotated by the sender for their intended truthfulness and by the receiver for their perceived truthfulness. Unlike existing datasets, this captures deception in long-lasting relationships, where the interlocutors strategically combine truth with lies to advance objectives. A model that uses power dynamics and conversational contexts can predict when a lie occurs nearly as well as human players.


Can You Unpack That? Learning to Rewrite Questions-in-Context
Ahmed Elgohary | Denis Peskov | Jordan Boyd-Graber
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Question answering is an AI-complete problem, but existing datasets lack key elements of language understanding such as coreference and ellipsis resolution. We consider sequential question answering: multiple questions are asked one-by-one in a conversation between a questioner and an answerer. Answering these questions is only possible through understanding the conversation history. We introduce the task of question-in-context rewriting: given the context of a conversation’s history, rewrite a context-dependent into a self-contained question with the same answer. We construct, CANARD, a dataset of 40,527 questions based on QuAC (Choi et al., 2018) and train Seq2Seq models for incorporating context into standalone questions.


Construction of the Literature Graph in Semantic Scholar
Waleed Ammar | Dirk Groeneveld | Chandra Bhagavatula | Iz Beltagy | Miles Crawford | Doug Downey | Jason Dunkelberger | Ahmed Elgohary | Sergey Feldman | Vu Ha | Rodney Kinney | Sebastian Kohlmeier | Kyle Lo | Tyler Murray | Hsu-Han Ooi | Matthew Peters | Joanna Power | Sam Skjonsberg | Lucy Lu Wang | Chris Wilhelm | Zheng Yuan | Madeleine van Zuylen | Oren Etzioni
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, representing papers, authors, entities and various interactions between them (e.g., authorships, citations, entity mentions). We reduce literature graph construction into familiar NLP tasks (e.g., entity extraction and linking), point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task. The methods described in this paper are used to enable semantic features in www.semanticscholar.org.

A dataset and baselines for sequential open-domain question answering
Ahmed Elgohary | Chen Zhao | Jordan Boyd-Graber
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Previous work on question-answering systems mainly focuses on answering individual questions, assuming they are independent and devoid of context. Instead, we investigate sequential question answering, asking multiple related questions. We present QBLink, a new dataset of fully human-authored questions. We extend existing strong question answering frameworks to include previous questions to improve the overall question-answering accuracy in open-domain question answering. The dataset is publicly available at http://sequential.qanta.org.

Generating Natural Language Adversarial Examples
Moustafa Alzantot | Yash Sharma | Ahmed Elgohary | Bo-Jhang Ho | Mani Srivastava | Kai-Wei Chang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Deep neural networks (DNNs) are vulnerable to adversarial examples, perturbations to correctly classified examples which can cause the model to misclassify. In the image domain, these perturbations can often be made virtually indistinguishable to human perception, causing humans and state-of-the-art models to disagree. However, in the natural language domain, small perturbations are clearly perceptible, and the replacement of a single word can drastically alter the semantics of the document. Given these challenges, we use a black-box population-based optimization algorithm to generate semantically and syntactically similar adversarial examples that fool well-trained sentiment analysis and textual entailment models with success rates of 97% and 70%, respectively. We additionally demonstrate that 92.3% of the successful sentiment analysis adversarial examples are classified to their original label by 20 human annotators, and that the examples are perceptibly quite similar. Finally, we discuss an attempt to use adversarial training as a defense, but fail to yield improvement, demonstrating the strength and diversity of our adversarial examples. We hope our findings encourage researchers to pursue improving the robustness of DNNs in the natural language domain.

Assessing Composition in Sentence Vector Representations
Allyson Ettinger | Ahmed Elgohary | Colin Phillips | Philip Resnik
Proceedings of the 27th International Conference on Computational Linguistics

An important component of achieving language understanding is mastering the composition of sentence meaning, but an immediate challenge to solving this problem is the opacity of sentence vector representations produced by current neural sentence composition models. We present a method to address this challenge, developing tasks that directly target compositional meaning information in sentence vector representations with a high degree of precision and control. To enable the creation of these controlled tasks, we introduce a specialized sentence generation system that produces large, annotated sentence sets meeting specified syntactic, semantic and lexical constraints. We describe the details of the method and generation system, and then present results of experiments applying our method to probe for compositional information in embeddings from a number of existing sentence composition models. We find that the method is able to extract useful information about the differing capacities of these models, and we discuss the implications of our results with respect to these systems’ capturing of sentence information. We make available for public use the datasets used for these experiments, as well as the generation system.


Probing for semantic evidence of composition by means of simple classification tasks
Allyson Ettinger | Ahmed Elgohary | Philip Resnik
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP

Learning Monolingual Compositional Representations via Bilingual Supervision
Ahmed Elgohary | Marine Carpuat
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)