2020
pdf
abs
Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions
David M. Howcroft
|
Anya Belz
|
Miruna-Adriana Clinciu
|
Dimitra Gkatzia
|
Sadid A. Hasan
|
Saad Mahamood
|
Simon Mille
|
Emiel van Miltenburg
|
Sashank Santhanam
|
Verena Rieser
Proceedings of the 13th International Conference on Natural Language Generation
Human assessment remains the most trusted form of evaluation in NLG, but highly diverse approaches and a proliferation of different quality criteria used by researchers make it difficult to compare results and draw conclusions across papers, with adverse implications for meta-evaluation and reproducibility. In this paper, we present (i) our dataset of 165 NLG papers with human evaluations, (ii) the annotation scheme we developed to label the papers for different aspects of evaluations, (iii) quantitative analyses of the annotations, and (iv) a set of recommendations for improving standards in evaluation reporting. We use the annotations as a basis for examining information included in evaluation reports, and levels of consistency in approaches, experimental design and terminology, focusing in particular on the 200+ different terms that have been used for evaluated aspects of quality. We conclude that due to a pervasive lack of clarity in reports and extreme diversity in approaches, human evaluation in NLG presents as extremely confused in 2020, and that the field is in urgent need of standard methods and terminology.
2018
pdf
abs
DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language Inference
Reza Ghaeini
|
Sadid A. Hasan
|
Vivek Datla
|
Joey Liu
|
Kathy Lee
|
Ashequl Qadir
|
Yuan Ling
|
Aaditya Prakash
|
Xiaoli Fern
|
Oladimeji Farri
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
We present a novel deep learning architecture to address the natural language inference (NLI) task. Existing approaches mostly rely on simple reading mechanisms for independent encoding of the premise and hypothesis. Instead, we propose a novel dependent reading bidirectional LSTM network (DR-BiLSTM) to efficiently model the relationship between a premise and a hypothesis during encoding and inference. We also introduce a sophisticated ensemble strategy to combine our proposed models, which noticeably improves final predictions. Finally, we demonstrate how the results can be improved further with an additional preprocessing step. Our evaluation shows that DR-BiLSTM obtains the best single model and ensemble model results achieving the new state-of-the-art scores on the Stanford NLI dataset.
2017
pdf
abs
Improving Clinical Diagnosis Inference through Integration of Structured and Unstructured Knowledge
Yuan Ling
|
Yuan An
|
Sadid Hasan
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications
This paper presents a novel approach to the task of automatically inferring the most probable diagnosis from a given clinical narrative. Structured Knowledge Bases (KBs) can be useful for such complex tasks but not sufficient. Hence, we leverage a vast amount of unstructured free text to integrate with structured KBs. The key innovative ideas include building a concept graph from both structured and unstructured knowledge sources and ranking the diagnosis concepts using the enhanced word embedding vectors learned from integrated sources. Experiments on the TREC CDS and HumanDx datasets showed that our methods improved the results of clinical diagnosis inference.
pdf
abs
Learning to Diagnose: Assimilating Clinical Narratives using Deep Reinforcement Learning
Yuan Ling
|
Sadid A. Hasan
|
Vivek Datla
|
Ashequl Qadir
|
Kathy Lee
|
Joey Liu
|
Oladimeji Farri
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Clinical diagnosis is a critical and non-trivial aspect of patient care which often requires significant medical research and investigation based on an underlying clinical scenario. This paper proposes a novel approach by formulating clinical diagnosis as a reinforcement learning problem. During training, the reinforcement learning agent mimics the clinician’s cognitive process and learns the optimal policy to obtain the most appropriate diagnoses for a clinical narrative. This is achieved through an iterative search for candidate diagnoses from external knowledge sources via a sentence-by-sentence analysis of the inherent clinical context. A deep Q-network architecture is trained to optimize a reward function that measures the accuracy of the candidate diagnoses. Experiments on the TREC CDS datasets demonstrate the effectiveness of our system over various non-reinforcement learning-based systems.
2016
pdf
abs
Neural Clinical Paraphrase Generation with Attention
Sadid A. Hasan
|
Bo Liu
|
Joey Liu
|
Ashequl Qadir
|
Kathy Lee
|
Vivek Datla
|
Aaditya Prakash
|
Oladimeji Farri
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)
Paraphrase generation is important in various applications such as search, summarization, and question answering due to its ability to generate textual alternatives while keeping the overall meaning intact. Clinical paraphrase generation is especially vital in building patient-centric clinical decision support (CDS) applications where users are able to understand complex clinical jargons via easily comprehensible alternative paraphrases. This paper presents Neural Clinical Paraphrase Generation (NCPG), a novel approach that casts the task as a monolingual neural machine translation (NMT) problem. We propose an end-to-end neural network built on an attention-based bidirectional Recurrent Neural Network (RNN) architecture with an encoder-decoder framework to perform the task. Conventional bilingual NMT models mostly rely on word-level modeling and are often limited by out-of-vocabulary (OOV) issues. In contrast, we represent the source and target paraphrase pairs as character sequences to address this limitation. To the best of our knowledge, this is the first work that uses attention-based RNNs for clinical paraphrase generation and also proposes an end-to-end character-level modeling for this task. Extensive experiments on a large curated clinical paraphrase corpus show that the attention-based NCPG models achieve improvements of up to 5.2 BLEU points and 0.5 METEOR points over a non-attention based strong baseline for word-level modeling, whereas further gains of up to 6.1 BLEU points and 1.3 METEOR points are obtained by the character-level NCPG models over their word-level counterparts. Overall, our models demonstrate comparable performance relative to the state-of-the-art phrase-based non-neural models.
pdf
abs
Neural Paraphrase Generation with Stacked Residual LSTM Networks
Aaditya Prakash
|
Sadid A. Hasan
|
Kathy Lee
|
Vivek Datla
|
Ashequl Qadir
|
Joey Liu
|
Oladimeji Farri
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
In this paper, we propose a novel neural approach for paraphrase generation. Conventional paraphrase generation methods either leverage hand-written rules and thesauri-based alignments, or use statistical machine learning principles. To the best of our knowledge, this work is the first to explore deep learning models for paraphrase generation. Our primary contribution is a stacked residual LSTM network, where we add residual connections between LSTM layers. This allows for efficient training of deep LSTMs. We evaluate our model and other state-of-the-art deep learning models on three different datasets: PPDB, WikiAnswers, and MSCOCO. Evaluation results demonstrate that our model outperforms sequence to sequence, attention-based, and bi-directional LSTM models on BLEU, METEOR, TER, and an embedding-based sentence similarity metric.
2015
pdf
bib
Towards Topic-to-Question Generation
Yllias Chali
|
Sadid A. Hasan
Computational Linguistics, Volume 41, Issue 1 - March 2015
2014
pdf
Fear the REAPER: A System for Automatic Multi-Document Summarization with Reinforcement Learning
Cody Rioux
|
Sadid A. Hasan
|
Yllias Chali
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
2013
pdf
On the Effectiveness of Using Syntactic and Shallow Semantic Tree Kernels for Automatic Assessment of Essays
Yllias Chali
|
Sadid A. Hasan
Proceedings of the Sixth International Joint Conference on Natural Language Processing
pdf
Using POMDPs for Topic-Focused Multi-Document Summarization (L’utilisation des POMDP pour les résumés multi-documents orientés par une thématique) [in French]
Yllias Chali
|
Sadid A. Hasan
|
Mustapha Mojahid
Proceedings of TALN 2013 (Volume 1: Long Papers)
2012
pdf
On the Effectiveness of using Sentence Compression Models for Query-Focused Multi-Document Summarization
Yllias Chali
|
Sadid A. Hasan
Proceedings of COLING 2012
pdf
Towards Automatic Topical Question Generation
Yllias Chali
|
Sadid A. Hasan
Proceedings of COLING 2012
pdf
bib
Automatically Assessing Free Texts
Yllias Chali
|
Sadid A. Hasan
Proceedings of the Workshop on Speech and Language Processing Tools in Education
pdf
bib
Simple or Complex? Classifying Questions by Answering Complexity
Yllias Chali
|
Sadid A. Hasan
Proceedings of the Workshop on Question Answering for Complex Domains
2011
pdf
Using Syntactic and Shallow Semantic Kernels to Improve Multi-Modality Manifold-Ranking for Topic-Focused Multi-Document Summarization
Yllias Chali
|
Sadid A. Hasan
|
Kaisar Imam
Proceedings of 5th International Joint Conference on Natural Language Processing
2010
pdf
abs
Automatic Question Generation from Sentences
Husam Ali
|
Yllias Chali
|
Sadid A. Hasan
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts
Question Generation (QG) and Question Answering (QA) are some of the many challenges for natural language understanding and interfaces. As humans need to ask good questions, the potential benefits from automated QG systems may assist them in meeting useful inquiry needs. In this paper, we consider an automatic Sentence-to-Question generation task, where given a sentence, the Question Generation (QG) system generates a set of questions for which the sentence contains, implies, or needs answers. To facilitate the question generation task, we build elementary sentences from the input complex sentences using a syntactic parser. A named entity recognizer and a part of speech tagger are applied on each of these sentences to encode necessary information. We classify the sentences based on their subject, verb, object and preposition for determining the possible type of questions to be generated. We use the TREC-2007 (Question Answering Track) dataset for our experiments and evaluation.
2009
pdf
Do Automatic Annotation Techniques Have Any Impact on Supervised Complex Question Answering?
Yllias Chali
|
Sadid Hasan
|
Shafiq Joty
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers