Deep neural networks have demonstrated high performance on many natural language processing (NLP) tasks that can be answered directly from text, and have struggled to solve NLP tasks requiring external (e.g., world) knowledge. In this paper, we present OSCR (Ontology-based Semantic Composition Regularization), a method for injecting task-agnostic knowledge from an Ontology or knowledge graph into a neural network during pre-training. We evaluated the performance of BERT pre-trained on Wikipedia with and without OSCR by measuring the performance when fine-tuning on two question answering tasks involving world knowledge and causal reasoning and one requiring domain (healthcare) knowledge and obtained 33.3%, 18.6%, and 4% improved accuracy compared to pre-training BERT without OSCR.
Recent work has shown that pre-trained Transformers obtain remarkable performance on many natural language processing tasks including automatic summarization. However, most work has focused on (relatively) data-rich single-document summarization settings. In this paper, we explore highly-abstractive multi-document summarization where the summary is explicitly conditioned on a user-given topic statement or question. We compare the summarization quality produced by three state-of-the-art transformer-based models: BART, T5, and PEGASUS. We report the performance on four challenging summarization datasets: three from the general domain and one from consumer health in both zero-shot and few-shot learning settings. While prior work has shown significant differences in performance for these models on standard summarization tasks, our results indicate that with as few as 10 labeled examples there is no statistically significant difference in summary quality, suggesting the need for more abstractive benchmark collections when determining state-of-the-art.
Automatic summarization research has traditionally focused on providing high quality general-purpose summaries of documents. However, there are many applications which require more specific summaries, such as supporting question answering or topic-based literature discovery. In this paper we study the problem of conditional summarization in which content selection and surface realization are explicitly conditioned on an ad-hoc natural language question or topic description. Because of the difficulty in obtaining sufficient reference summaries to support arbitrary conditional summarization, we explore the use of multi-task fine-tuning (MTFT) on twenty-one natural language tasks to enable zero-shot conditional summarization on five tasks. We present four new summarization datasets, two novel “online” or adaptive task-mixing strategies, and report zero-shot performance using T5 and BART, demonstrating that MTFT can improve zero-shot summarization quality.
Our ability to understand language often relies on common-sense knowledge ― background information the speaker can assume is known by the reader. Similarly, our comprehension of the language used in complex domains relies on access to domain-specific knowledge. Capturing common-sense and domain-specific knowledge can be achieved by taking advantage of recent advances in open information extraction (IE) techniques and, more importantly, of knowledge embeddings, which are multi-dimensional representations of concepts and relations. Building a knowledge graph for representing common-sense knowledge in which concepts discerned from noun phrases are cast as vertices and lexicalized relations are cast as edges leads to learning the embeddings of common-sense knowledge accounting for semantic compositionality as well as implied knowledge. Common-sense knowledge is acquired from a vast collection of blogs and books as well as from WordNet. Similarly, medical knowledge is learned from two large sets of electronic health records. The evaluation results of these two forms of knowledge are promising: the same knowledge acquisition methodology based on learning knowledge embeddings works well both for common-sense knowledge and for medical knowledge Interestingly, the common-sense knowledge that we have acquired was evaluated as being less neutral than than the medical knowledge, as it often reflected the opinion of the knowledge utterer. In addition, the acquired medical knowledge was evaluated as more plausible than the common-sense knowledge, reflecting the complexity of acquiring common-sense knowledge due to the pragmatics and economicity of language.
Electronic Medical Records (EMRs) encode an extraordinary amount of medical knowledge. Collecting and interpreting this knowledge, however, belies a significant level of clinical understanding. Automatically capturing the clinical information is crucial for performing comparative effectiveness research. In this paper, we present a data-driven approach to model semantic dependencies between medical concepts, qualified by the beliefs of physicians. The dependencies, captured in a patient cohort graph of clinical pictures and therapies is further refined into a probabilistic graphical model which enables efficient inference of patient-centered treatment or test recommendations (based on probabilities). To perform inference on the graphical model, we describe a technique of smoothing the conditional likelihood of medical concepts by their semantically-similar belief values. The experimental results, as compared against clinical guidelines are very promising.
A significant amount of spatial information in textual documents is hidden within the relationship between events. While humans have an intuitive understanding of these relationships that allow us to recover an object's or event's location, currently no annotated data exists to allow automatic discovery of spatial containment relations between events. We present our process for building such a corpus of manually annotated spatial relations between events. Events form complex predicate-argument structures that model the participants in the event, their roles, as well as the temporal and spatial grounding. In addition, events are not presented in isolation in text; there are explicit and implicit interactions between events that often participate in event structures. In this paper, we focus on five spatial containment relations that may exist between events: (1) SAME, (2) CONTAINS, (3) OVERLAPS, (4) NEAR, and (5) DIFFERENT. Using the transitive closure across these spatial relations, the implicit location of many events and their participants can be discovered. We discuss our annotation schema for spatial containment relations, placing it within the pre-existing theories of spatial representation. We also discuss our annotation guidelines for maintaining annotation quality as well as our process for augmenting SpatialML with spatial containment relations between events. Additionally, we outline some baseline experiments to evaluate the feasibility of developing supervised systems based on this corpus. These results indicate that although the task is challenging, automated methods are capable of discovering spatial containment relations between events.