This paper presents detailed mappings between the structures used in Abstract Meaning Representation (AMR) and those used in Uniform Meaning Representation (UMR). These structures include general semantic roles, rolesets, and concepts that are largely shared between AMR and UMR, but with crucial differences. While UMR annotation of new low-resource languages is ongoing, AMR-annotated corpora already exist for many languages, and these AMR corpora are ripe for conversion to UMR format. Rather than focusing on semantic coverage that is new to UMR (which will likely need to be dealt with manually), this paper serves as a resource (with illustrated mappings) for users looking to understand the fine-grained adjustments that have been made to the representation techniques for semantic categoriespresent in both AMR and UMR.
We introduce RESIN-11, a new schema-guided event extraction&prediction framework that can be applied to a large variety of newsworthy scenarios. The framework consists of two parts: (1) an open-domain end-to-end multimedia multilingual information extraction system with weak-supervision and zero-shot learningbased techniques. (2) schema matching and schema-guided event prediction based on our curated schema library. We build a demo website based on our dockerized system and schema library publicly available for installation (https://github.com/RESIN-KAIROS/RESIN-11). We also include a video demonstrating the system.
As vision processing and natural language processing continue to advance, there is increasing interest in multimodal applications, such as image retrieval, caption generation, and human-robot interaction. These tasks require close alignment between the information in the images and text. In this paper, we present a new multimodal dataset that combines state of the art semantic annotation for language with the bounding boxes of corresponding images. This richer multimodal labeling supports cross-modal inference for applications in which such alignment is useful. Our semantic representations, developed in the natural language processing community, abstract away from the surface structure of the sentence, focusing on specific actions and the roles of their participants, a level that is equally relevant to images. We then utilize these representations in the form of semantic role labels in the captions and the images and demonstrate improvements in standard tasks such as image retrieval. The potential contributions of these additional labels is evaluated using a role-aware retrieval system based on graph convolutional and recurrent neural networks. The addition of semantic roles into this system provides a significant increase in capability and greater flexibility for these tasks, and could be extended to state-of-the-art techniques relying on transformers with larger amounts of annotated data.
This paper describes the evolution of the PropBank approach to semantic role labeling over the last two decades. During this time the PropBank frame files have been expanded to include non-verbal predicates such as adjectives, prepositions and multi-word expressions. The number of domains, genres and languages that have been PropBanked has also expanded greatly, creating an opportunity for much more challenging and robust testing of the generalization capabilities of PropBank semantic role labeling systems. We also describe the substantial effort that has gone into ensuring the consistency and reliability of the various annotated datasets and resources, to better support the training and evaluation of such systems
Claim detection and verification are crucial for news understanding and have emerged as promising technologies for mitigating misinformation and disinformation in the news. However, most existing work has focused on claim sentence analysis while overlooking additional crucial attributes (e.g., the claimer and the main object associated with the claim).In this work, we present NewsClaims, a new benchmark for attribute-aware claim detection in the news domain. We extend the claim detection problem to include extraction of additional attributes related to each claim and release 889 claims annotated over 143 news articles. NewsClaims aims to benchmark claim detection systems in emerging scenarios, comprising unseen topics with little or no training data. To this end, we see that zero-shot and prompt-based baselines show promising performance on this benchmark, while still considerably behind human performance.
This tutorial reviews the design of common meaning representations, SoTA models for predicting meaning representations, and the applications of meaning representations in a wide range of downstream NLP tasks and real-world applications. Reporting by a diverse team of NLP researchers from academia and industry with extensive experience in designing, building and using meaning representations, our tutorial has three components: (1) an introduction to common meaning representations, including basic concepts and design challenges; (2) a review of SoTA methods on building models for meaning representations; and (3) an overview of applications of meaning representations in downstream NLP tasks and real-world applications. We will also present qualitative comparisons of common meaning representations and a quantitative study on how their differences impact model performance. Finally, we will share best practices in choosing the right meaning representation for downstream tasks.
Biomedical Information Extraction from scientific literature presents two unique and non-trivial challenges. First, compared with general natural language texts, sentences from scientific papers usually possess wider contexts between knowledge elements. Moreover, comprehending the fine-grained scientific entities and events urgently requires domain-specific background knowledge. In this paper, we propose a novel biomedical Information Extraction (IE) model to tackle these two challenges and extract scientific entities and events from English research papers. We perform Abstract Meaning Representation (AMR) to compress the wide context to uncover a clear semantic structure for each complex sentence. Besides, we construct the sentence-level knowledge graph from an external knowledge base and use it to enrich the AMR graph to improve the model’s understanding of complex scientific concepts. We use an edge-conditioned graph attention network to encode the knowledge-enriched AMR graph for biomedical IE tasks. Experiments on the GENIA 2011 dataset show that the AMR and external knowledge have contributed 1.8% and 3.0% absolute F-score gains respectively. In order to evaluate the impact of our approach on real-world problems that involve topic-specific fine-grained knowledge elements, we have also created a new ontology and annotated corpus for entity and event extraction for the COVID-19 scientific literature, which can serve as a new benchmark for the biomedical IE community.
Much past work has focused on extracting information like events, entities, and relations from documents. Very little work has focused on analyzing these results for better model understanding. In this paper, we introduce a curation interface that takes an Information Extraction (IE) system’s output in a pre-defined format and generates a graphical representation of its elements. The interface supports editing while curating schemas for complex events like Improvised Explosive Device (IED) based scenarios. We identify various schemas that either have linear event chains or contain parallel events with complicated temporal ordering. We iteratively update an induced schema to uniquely identify events specific to it, add optional events around them, and prune unnecessary events. The resulting schemas are improved and enriched versions of the machine-induced versions.
To combat COVID-19, both clinicians and scientists need to digest the vast amount of relevant biomedical knowledge in literature to understand the disease mechanism and the related biological functions. We have developed a novel and comprehensive knowledge discovery framework, COVID-KG to extract fine-grained multimedia knowledge elements (entities, relations and events) from scientific literature. We then exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation, using drug repurposing as a case study. Our framework also provides detailed contextual sentences, subfigures, and knowledge subgraphs as evidence. All of the data, KGs, reports.
We present a new information extraction system that can automatically construct temporal event graphs from a collection of news documents from multiple sources, multiple languages (English and Spanish for our experiment), and multiple data modalities (speech, text, image and video). The system advances state-of-the-art from two aspects: (1) extending from sentence-level event extraction to cross-document cross-lingual cross-media event extraction, coreference resolution and temporal event tracking; (2) using human curated event schema library to match and enhance the extraction output. We have made the dockerlized system publicly available for research purpose at GitHub, with a demo video.
Acquiring training data for natural language processing systems can be expensive and time-consuming. Given a few training examples crafted by experts, large corpora can be mined for thousands of semantically similar examples that provide useful variability to improve model generalization. We present TopGuNN, a fast contextualized k-NN retrieval system that can efficiently index and search over contextual embeddings generated from large corpora. TopGuNN is demonstrated for a training data augmentation use case over the Gigaword corpus. Using approximate k-NN and an efficient architecture, TopGuNN performs queries over an embedding space of 4.63TB (approximately 1.5B embeddings) in less than a day.
Despite recent advances in semantic role labeling propelled by pre-trained text encoders like BERT, performance lags behind when applied to predicates observed infrequently during training or to sentences in new domains. In this work, we investigate how role labeling performance on low-frequency predicates and out-of-domain data can be further improved by using VerbNet, a verb lexicon that groups verbs into hierarchical classes based on shared syntactic and semantic behavior and defines semantic representations describing relations between arguments. We find that VerbNet classes provide an effective level of abstraction, improving generalization on low-frequency predicates by allowing them to learn from the training examples of other predicates belonging to the same class. We also find that joint training of VerbNet role labeling and predicate disambiguation of VerbNet classes for polysemous verbs leads to improvements in both tasks, naturally supporting the extraction of VerbNet’s semantic representations.
Active learning has been shown to reduce annotation requirements for numerous natural language processing tasks, including semantic role labeling (SRL). SRL involves labeling argument spans for potentially multiple predicates in a sentence, which makes it challenging to aggregate the numerous decisions into a single score for determining new instances to annotate. In this paper, we apply two ways of aggregating scores across multiple predicates in order to choose query sentences with two methods of estimating model certainty: using the neural network’s outputs and using dropout-based Bayesian Active Learning by Disagreement. We compare these methods with three passive baselines — random sentence selection, random whole-document selection, and selecting sentences with the most predicates — and analyse the effect these strategies have on the learning curve with respect to reducing the number of annotated sentences and predicates to achieve high performance.
The SemLink resource provides mappings between a variety of lexical semantic ontologies, each with their strengths and weaknesses. To take advantage of these differences, the ability to move between resources is essential. This work describes advances made to improve the usability of the SemLink resource: the automatic addition of new instances and mappings, manual corrections, sense-based vectors and collocation information, and architecture built to automatically update the resource when versions of the underlying resources change. These updates improve coverage, provide new tools to leverage the capabilities of these resources, and facilitate seamless updates, ensuring the consistency and applicability of these mappings in the future.
We present AutoAspect, a novel, rule-based annotation tool for labeling tense and aspect. The pilot version annotates English data. The aspect labels are designed specifically for Uniform Meaning Representations (UMR), an annotation schema that aims to encode crosslingual semantic information. The annotation tool combines syntactic and semantic cues to assign aspects on a sentence-by-sentence basis, following a sequence of rules that each output a UMR aspect. Identified events proceed through the sequence until they are assigned an aspect. We achieve a recall of 76.17% for identifying UMR events and an accuracy of 62.57% on all identified events, with high precision values for 2 of the aspect labels.
Tracking entity states is a natural language processing task assumed to require human annotation. In order to reduce the time and expenses associated with annotation, we introduce a new method to automatically extract entity states, including location and existence state of entities, following Dalvi et al. (2018) and Tandon et al. (2020). For this purpose, we rely primarily on the semantic representations generated by the state of the art VerbNet parser (Gung, 2020), and extract the entities (event participants) and their states, based on the semantic predicates of the generated VerbNet semantic representation, which is in propositional logic format. For evaluation, we used ProPara (Dalvi et al., 2018), a reading comprehension dataset which is annotated with entity states in each sentence, and tracks those states in paragraphs of natural human-authored procedural texts. Given the presented limitations of the method, the peculiarities of the ProPara dataset annotations, and that our system, Lexis, makes no use of task-specific training data and relies solely on VerbNet, the results are promising, showcasing the value of lexical resources.
Abstract Meaning Representations (AMRs), a syntax-free representation of phrase semantics are useful for capturing the meaning of a phrase and reflecting the relationship between concepts that are referred to. However, annotating AMRs are time consuming and expensive. The existing annotation process requires expertly trained workers who have knowledge of an extensive set of guidelines for parsing phrases. In this paper, we propose a cost-saving two-step process for the creation of a corpus of AMR-phrase pairs for spatial referring expressions. The first step uses non-specialists to perform simple annotations that can be leveraged in the second step to accelerate the annotation performed by the experts. We hypothesize that our process will decrease the cost per annotation and improve consistency across annotators. Few corpora of spatial referring expressions exist and the resulting language resource will be valuable for referring expression comprehension and generation modeling.
We present refinements over existing temporal relation annotations in the Electronic Medical Record clinical narrative. We refined the THYME corpus annotations to more faithfully represent nuanced temporality and nuanced temporal-coreferential relations. The main contributions are in re-defining CONTAINS and OVERLAP relations into CONTAINS, CONTAINS-SUBEVENT, OVERLAP and NOTED-ON. We demonstrate that these refinements lead to substantial gains in learnability for state-of-the-art transformer models as compared to previously reported results on the original THYME corpus. We thus establish a baseline for the automatic extraction of these refined temporal relations. Although our study is done on clinical narrative, we believe it addresses far-reaching challenges that are corpus- and domain- agnostic.
Recent neural network-driven semantic role labeling (SRL) systems have shown impressive improvements in F1 scores. These improvements are due to expressive input representations, which, at least at the surface, are orthogonal to knowledge-rich constrained decoding mechanisms that helped linear SRL models. Introducing the benefits of structure to inform neural models presents a methodological challenge. In this paper, we present a structured tuning framework to improve models using softened constraints only at training time. Our framework leverages the expressiveness of neural networks and provides supervision with structured loss components. We start with a strong baseline (RoBERTa) to validate the impact of our approach, and show that our framework outperforms the baseline by learning to comply with declarative constraints. Additionally, our experiments with smaller training sizes show that we can achieve consistent improvements under low-resource scenarios.
This paper presents an expansion to the Abstract Meaning Representation (AMR) annotation schema that captures fine-grained semantically and pragmatically derived spatial information in grounded corpora. We describe a new lexical category conceptualization and set of spatial annotation tools built in the context of a multimodal corpus consisting of 170 3D structure-building dialogues between a human architect and human builder in Minecraft. Minecraft provides a particularly beneficial spatial relation-elicitation environment because it automatically tracks locations and orientations of objects and avatars in the space according to an absolute Cartesian coordinate system. Through a two-step process of sentence-level and document-level annotation designed to capture implicit information, we leverage these coordinates and bearings in the AMRs in combination with spatial framework annotation to ground the spatial language in the dialogues to absolute space.
Spatial Reasoning from language is essential for natural language understanding. Supporting it requires a representation scheme that can capture spatial phenomena encountered in language as well as in images and videos.Existing spatial representations are not sufficient for describing spatial configurations used in complex tasks. This paper extends the capabilities of existing spatial representation languages and increases coverage of the semantic aspects that are needed to ground spatial meaning of natural language text in the world. Our spatial relation language is able to represent a large, comprehensive set of spatial concepts crucial for reasoning and is designed to support composition of static and dynamic spatial configurations. We integrate this language with the Abstract Meaning Representation (AMR) annotation schema and present a corpus annotated by this extended AMR. To exhibit the applicability of our representation scheme, we annotate text taken from diverse datasets and show how we extend the capabilities of existing spatial representation languages with fine-grained decomposition of semantics and blend it seamlessly with AMRs of sentences and discourse representations as a whole.
This paper presents a proposition bank for Russian (RuPB), a resource for semantic role labeling (SRL). The motivating goal for this resource is to automatically project semantic role labels from English to Russian. This paper describes frame creation strategies, coverage, and the process of sense disambiguation. It discusses language-specific issues that complicated the process of building the PropBank and how these challenges were exploited as language-internal guidance for consistency and coherence.
In the field of metaphor detection, deep learning systems are the ubiquitous and achieve strong performance on many tasks. However, due to the complicated procedures for manually identifying metaphors, the datasets available are relatively small and fraught with complications. We show that using syntactic features and lexical resources can automatically provide additional high-quality training data for metaphoric language, and this data can cover gaps and inconsistencies in metaphor annotation, improving state-of-the-art word-level metaphor identification. This novel application of automatically improving training data improves classification across numerous tasks, and reconfirms the necessity of high-quality data for deep learning frameworks.
This paper proposes using a Bidirectional LSTM-CRF model in order to identify the tense and aspect of verbs. The information that this classifier outputs can be useful for ordering events and can provide a pre-processing step to improve efficiency of annotating this type of information. This neural network architecture has been successfully employed for other sequential labeling tasks, and we show that it significantly outperforms the rule-based tool TMV-annotator on the Propbank I dataset.
This paper announces the release of a new version of the English lexical resource VerbNet with substantially revised semantic representations designed to facilitate computer planning and reasoning based on human language. We use the transfer of possession and transfer of information event representations to illustrate both the general framework of the representations and the types of nuances the new representations can capture. These representations use a Generative Lexicon-inspired subevent structure to track attributes of event participants across time, highlighting oppositions and temporal and causal relations among the subevents.
The vast amount of research introducing new corpora and techniques for semi-automatically annotating corpora shows the important role that datasets play in today’s research, especially in the machine learning community. This rapid development raises concerns about the quality of the datasets created and consequently of the models trained, as recently discussed with respect to the Natural Language Inference (NLI) task. In this work we conduct an annotation experiment based on a small subset of the SICK corpus. The experiment reveals several problems in the annotation guidelines, and various challenges of the NLI task itself. Our quantitative evaluation of the experiment allows us to assign our empirical observations to specific linguistic phenomena and leads us to recommendations for future annotation tasks, for NLI and possibly for other tasks.
Verbs play a fundamental role in many biomed-ical tasks and applications such as relation and event extraction. We hypothesize that performance on many downstream tasks can be improved by aligning the input pretrained embeddings according to semantic verb classes.In this work, we show that by using semantic clusters for verbs, a large lexicon of verbclasses derived from biomedical literature, weare able to improve the performance of common pretrained embeddings in downstream tasks by retrofitting them to verb classes. We present a simple and computationally efficient approach using a widely-available “off-the-shelf” retrofitting algorithm to align pretrained embeddings according to semantic verb clusters. We achieve state-of-the-art results on text classification and relation extraction tasks.
Previous work on light verb constructions (e.g. chorii kar ‘theft do; steal’) in Hindi describes their syntactic formation via co-predication (Ahmed et al., 2012, Butt, 2014). This implies that both noun and light verb contribute their arguments, and these overlapping argument structures must be composed in the syntax. In this paper, we present a co-predication analysis using Tree-Adjoining Grammar, which models syntactic composition and semantic selectional preferences without transformations (deletion or argument identification). The analysis has two key components (i) an underspecified category for the nominal and (ii) combinatorial constraints on the noun and light verb to specify selectional preferences. The former has the advantage of syntactic composition without argument identification and the latter prevents over-generalization, while recognizing the semantic contribution of both predicates. This work additionally accounts for the agreement facts for the Hindi LVC.
This paper discusses a cross-document coreference annotation schema that was developed to further automatic extraction of timelines in the clinical domain. Lexical senses and coreference choices are determined largely by context, but cross-document work requires reasoning across contexts that are not necessarily coherent. We found that an annotation approach that relies less on context-guided annotator intuitions and more on schematic rules was most effective in creating meaningful and consistent cross-document relations.
This paper presents the outcomes of the Parsing Time Normalization shared task held within SemEval-2018. The aim of the task is to parse time expressions into the compositional semantic graphs of the Semantically Compositional Annotation of Time Expressions (SCATE) schema, which allows the representation of a wider variety of time expressions than previous approaches. Two tracks were included, one to evaluate the parsing of individual components of the produced graphs, in a classic information extraction way, and another one to evaluate the quality of the time intervals resulting from the interpretation of those graphs. Though 40 participants registered for the task, only one team submitted output, achieving 0.55 F1 in Track 1 (parsing) and 0.70 F1 in Track 2 (intervals).
Commonsense, real-world knowledge about the events that entities or “things in the world” are typically involved in, as well as part-whole relationships, is valuable for allowing computational systems to draw everyday inferences about the world. Here, we focus on automatically extracting information about (1) the events that typically bring about certain entities (origins), (2) the events that are the typical functions of entities, and (3) part-whole relationships in entities. These correspond to the agentive, telic and constitutive qualia central to the Generative Lexicon. We describe our motivations and methods for extracting these qualia relations from the Suggested Upper Merged Ontology (SUMO) and show that human annotators overwhelmingly find the information extracted to be reasonable. Because ontologies provide a way of structuring this information and making it accessible to agents and computational systems generally, efforts are underway to incorporate the extracted information to an ontology hub of Natural Language Processing semantic role labeling resources, the Rich Event Ontology.
There are few corpora that endeavor to represent the semantic content of entire documents. We present a corpus that accomplishes one way of capturing document level semantics, by annotating coreference and similar phenomena (bridging and implicit roles) on top of gold Abstract Meaning Representations of sentence-level semantics. We present a new corpus of this annotation, with analysis of its quality, alongside a plausible baseline for comparison. It is hoped that this Multi-Sentence AMR corpus (MS-AMR) may become a feasible method for developing rich representations of document meaning, useful for tasks such as information extraction and question answering.
Identification of metaphoric language in text is critical for generating effective semantic representations for natural language understanding. Computational approaches to metaphor identification have largely relied on heuristic based models or feature-based machine learning, using hand-crafted lexical resources coupled with basic syntactic information. However, recent work has shown the predictive power of syntactic constructions in determining metaphoric source and target domains (Sullivan 2013). Our work intends to explore syntactic constructions and their relation to metaphoric language. We undertake a corpus-based analysis of predicate-argument constructions and their metaphoric properties, and attempt to effectively represent syntactic constructions as features for metaphor processing, both in identifying source and target domains and in distinguishing metaphoric words from non-metaphoric.
A large amount of social media data is generated during natural disasters, and identifying the relevant portions of this data is critical for researchers attempting to understand human behavior, the effects of information sources, and preparatory actions undertaken during these events. In order to classify human behavior during hazard events, we employ machine learning for two tasks: identifying hurricane related tweets and classifying user evacuation behavior during hurricanes. We show that feature-based and deep learning methods provide different benefits for tweet classification, and ensemble-based methods using linguistic, temporal, and geospatial features can effectively classify user behavior.
When a hazard such as a hurricane threatens, people are forced to make a wide variety of decisions, and the information they receive and produce can influence their own and others’ actions. As social media grows more popular, an increasing number of people are using social media platforms to obtain and share information about approaching threats and discuss their interpretations of the threat and their protective decisions. This work aims to improve understanding of natural disasters through social media and provide an annotation scheme to identify themes in user’s social media behavior and facilitate efforts in supervised machine learning. To that end, this work has three contributions: (1) the creation of an annotation scheme to consistently identify hazard-related themes in Twitter, (2) an overview of agreement rates and difficulties in identifying annotation categories, and (3) a public release of both the dataset and guidelines developed from this scheme.
In this paper we describe a new lexical semantic resource, The Rich Event On-tology, which provides an independent conceptual backbone to unify existing semantic role labeling (SRL) schemas and augment them with event-to-event causal and temporal relations. By unifying the FrameNet, VerbNet, Automatic Content Extraction, and Rich Entities, Relations and Events resources, the ontology serves as a shared hub for the disparate annotation schemas and therefore enables the combination of SRL training data into a larger, more diverse corpus. By adding temporal and causal relational information not found in any of the independent resources, the ontology facilitates reasoning on and across documents, revealing relationships between events that come together in temporal and causal chains to build more complex scenarios. We envision the open resource serving as a valuable tool for both moving from the ontology to text to query for event types and scenarios of interest, and for moving from text to the ontology to access interpretations of events using the combined semantic information housed there.
Agents that communicate back and forth with humans to help them execute non-linguistic tasks are a long sought goal of AI. These agents need to translate between utterances and actionable meaning representations that can be interpreted by task-specific problem solvers in a context-dependent manner. They should also be able to learn such actionable interpretations for new predicates on the fly. We define an agent architecture for this scenario and present a series of experiments in the Blocks World domain that illustrate how our architecture supports language learning and problem solving in this domain.
Clinical TempEval 2017 aimed to answer the question: how well do systems trained on annotated timelines for one medical condition (colon cancer) perform in predicting timelines on another medical condition (brain cancer)? Nine sub-tasks were included, covering problems in time expression identification, event expression identification and temporal relation identification. Participant systems were evaluated on clinical and pathology notes from Mayo Clinic cancer patients, annotated with an extension of TimeML for the clinical domain. 11 teams participated in the tasks, with the best systems achieving F1 scores above 0.55 for time expressions, above 0.70 for event expressions, and above 0.40 for temporal relations. Most tasks observed about a 20 point drop over Clinical TempEval 2016, where systems were trained and evaluated on the same domain (colon cancer).
In this paper, we introduce an Abstract Meaning Representation (AMR) to Dependency Parse aligner. Alignment is a preliminary step for AMR parsing, and our aligner improves current AMR parser performance. Our aligner involves several different features, including named entity tags and semantic role labels, and uses Expectation-Maximization training. Results show that our aligner reaches an 87.1% F-Score score with the experimental data, and enhances AMR parsing.
High accuracy for automated translation and information retrieval calls for linguistic annotations at various language levels. The plethora of informal internet content sparked the demand for porting state-of-art natural language processing (NLP) applications to new social media as well as diverse language adaptation. Effort launched by the BOLT (Broad Operational Language Translation) program at DARPA (Defense Advanced Research Projects Agency) successfully addressed the internet information with enhanced NLP systems. BOLT aims for automated translation and linguistic analysis for informal genres of text and speech in online and in-person communication. As a part of this program, the Linguistic Data Consortium (LDC) developed valuable linguistic resources in support of the training and evaluation of such new technologies. This paper focuses on methodologies, infrastructure, and procedure for developing linguistic annotation at various language levels, including Treebank (TB), word alignment (WA), PropBank (PB), and co-reference (CoRef). Inspired by the OntoNotes approach with adaptations to the tasks to reflect the goals and scope of the BOLT project, this effort has introduced more annotation types of informal and free-style genres in English, Chinese and Egyptian Arabic. The corpus produced is by far the largest multi-lingual, multi-level and multi-genre annotation corpus of informal text and speech.
This paper describes our efforts for the development of a Proposition Bank for Urdu, an Indo-Aryan language. Our primary goal is the labeling of syntactic nodes in the existing Urdu dependency Treebank with specific argument labels. In essence, it involves annotation of predicate argument structures of both simple and complex predicates in the Treebank corpus. We describe the overall process of building the PropBank of Urdu. We discuss various statistics pertaining to the Urdu PropBank and the issues which the annotators encountered while developing the PropBank. We also discuss how these challenges were addressed to successfully expand the PropBank corpus. While reporting the Inter-annotator agreement between the two annotators, we show that the annotators share similar understanding of the annotation guidelines and of the linguistic phenomena present in the language. The present size of this Propbank is around 180,000 tokens which is double-propbanked by the two annotators for simple predicates. Another 100,000 tokens have been annotated for complex predicates of Urdu.
Recent efforts have focused on expanding the annotation coverage of PropBank from verb relations to adjective and noun relations, as well as light verb constructions (e.g., make an offer, take a bath). While each new relation type has presented unique annotation challenges, ensuring consistent and comprehensive annotation of light verb constructions has proved particularly challenging, given that light verb constructions are semi-productive, difficult to define, and there are often borderline cases. This research describes the iterative process of developing PropBank annotation guidelines for light verb constructions, the current guidelines, and a comparison to related resources.
Light verb constructions (LVC) in Hindi are highly productive. If we can distinguish a case such as nirnay lenaa ‘decision take; decide’ from an ordinary verb-argument combination kaagaz lenaa ‘paper take; take (a) paper’,it has been shown to aid NLP applications such as parsing (Begum et al., 2011) and machine translation (Pal et al., 2011). In this paper, we propose an LVC identification system using language specific features for Hindi which shows an improvement over previous work(Begum et al., 2011). To build our system, we carry out a linguistic analysis of Hindi LVCs using Hindi Treebank annotations and propose two new features that are aimed at capturing the diversity of Hindi LVCs in the corpus. We find that our model performs robustly across a diverse range of LVCs and our results underscore the importance of semantic features, which is in keeping with the findings for English. Our error analysis also demonstrates that our classifier can be used to further refine LVC annotations in the Hindi Treebank and make them more consistent across the board.
This research focuses on expanding PropBank, a corpus annotated with predicate argument structures, with new predicate types; namely, noun, adjective and complex predicates, such as Light Verb Constructions. This effort is in part inspired by a sister project to PropBank, the Abstract Meaning Representation project, which also attempts to capture who is doing what to whom in a sentence, but does so in a way that abstracts away from syntactic structures. For example, alternate realizations of a ‘destroying’ event in the form of either the verb ‘destroy’ or the noun ‘destruction’ would receive the same Abstract Meaning Representation. In order for PropBank to reach the same level of coverage and continue to serve as the bedrock for Abstract Meaning Representation, predicate types other than verbs, which have previously gone without annotation, must be annotated. This research describes the challenges therein, including the development of new annotation practices that walk the line between abstracting away from language-particular syntactic facts to explore deeper semantics, and maintaining the connection between semantics and syntactic structures that has proven to be very valuable for PropBank as a corpus of training data for Natural Language Processing applications.
Abstract Meaning Representations (AMRs) are rooted, directional and labeled graphs that abstract away from morpho-syntactic idiosyncrasies such as word category (verbs and nouns), word order, and function words (determiners, some prepositions). Because these syntactic idiosyncrasies account for many of the cross-lingual differences, it would be interesting to see if this representation can serve, e.g., as a useful, minimally divergent transfer layer in machine translation. To answer this question, we have translated 100 English sentences that have existing AMRs into Chinese and Czech to create AMRs for them. A cross-linguistic comparison of English to Chinese and Czech AMRs reveals both cases where the AMRs for the language pairs align well structurally and cases of linguistic divergence. We found that the level of compatibility of AMR between English and Chinese is higher than between English and Czech. We believe this kind of comparison is beneficial to further refining the annotation standards for each of the three languages and will lead to more compatible annotation guidelines between the languages.
While natural language processing performance has been improved through the recognition that there is a relationship between the semantics of the verb and the syntactic context in which the verb is realized, sentences where the verb does not conform to the expected syntax-semantic patterning behavior remain problematic. For example, in the sentence The crowed laughed the clown off the stage, a verb of non-verbal communication laugh is used in a caused motion construction and gains a motion entailment that is atypical given its inherent lexical semantics. This paper focuses on our efforts at defining the semantic types and varieties of caused motion constructions (CMCs) through an iterative annotation process and establishing annotation guidelines based on these criteria to aid in the production of a consistent and reliable annotation. The annotation will serve as training and test data for classifiers for CMCs, and the CMC definitions developed throughout this study will be used in extending VerbNet to handle representations of sentences in which a verb is used in a syntactic context that is atypical for its lexical semantics.
In this paper we present an alignment experiment between patterns of verb use discovered by Corpus Pattern Analysis (CPA; Hanks 2004, 2008, 2012) and verb senses in OntoNotes (ON; Hovy et al. 2006, Weischedel et al. 2011). We present a probabilistic approach for mapping one resource into the other. Firstly we introduce a basic model, based on conditional probabilities, which determines for any given sentence the best CPA pattern match. On the basis of this model, we propose a joint source channel model (JSCM) that computes the probability of compatibility of semantic types between a verb phrase and a pattern, irrespective of whether the verb phrase is a norm or an exploitation. We evaluate the accuracy of the proposed mapping using cluster similarity metrics based on entropy.
We present a supervised method for verb sense disambiguation based on VerbNet. Most previous supervised approaches to verb sense disambiguation create a classifier for each verb that reaches a frequency threshold. These methods, however, have a significant practical problem that they cannot be applied to rare or unseen verbs. In order to overcome this problem, we create a single classifier to be applied to rare or unseen verbs in a new text. This single classifier also exploits generalized semantic features of a verb and its modifiers in order to better deal with rare or unseen verbs. Our experimental results show that the proposed method achieves equivalent performance to per-verb classifiers, which cannot be applied to unseen verbs. Our classifier could be utilized to improve the classifications in lexical resources of verbs, such as VerbNet, in a semi-automatic manner and to possibly extend the coverage of these resources to new verbs.
Annotation of data is a time-consuming process, but necessary for many state-of-the-art solutions to NLP tasks, including semantic role labeling (SRL). In this paper, we show that language models may be used to select sentences that are more useful to annotate. We simulate a situation where only a portion of the available data can be annotated, and compare language model based selection against a more typical baseline of randomly selected data. The data is ordered using an off-the-shelf language modeling toolkit. We show that the least probable sentences provide dramatic improved system performance over the baseline, especially when only a small portion of the data is annotated. In fact, the lion’s share of the performance can be attained by annotating only 10-20% of the data. This result holds for training a model based on new annotation, as well as when adding domain-specific annotation to a general corpus for domain adaptation.
This article discusses the requirements of a formal specification for the annotation of temporal information in clinical narratives. We discuss the implementation and extension of ISO-TimeML for annotating a corpus of clinical notes, known as the THYME corpus. To reflect the information task and the heavily inference-based reasoning demands in the domain, a new annotation guideline has been developed, “the THYME Guidelines to ISO-TimeML (THYME-TimeML)”. To clarify what relations merit annotation, we distinguish between linguistically-derived and inferentially-derived temporal orderings in the text. We also apply a top performing TempEval 2013 system against this new resource to measure the difficulty of adapting systems to the clinical domain. The corpus is available to the community and has been proposed for use in a SemEval 2015 task.
This paper examines both linguistic behavior and practical implication of empty argument insertion in the Hindi PropBank. The Hindi PropBank is annotated on the Hindi Dependency Treebank, which contains some empty categories but not the empty arguments of verbs. In this paper, we analyze four kinds of empty arguments, *PRO*, *REL*, *GAP*, *pro*, and suggest effective ways of annotating these arguments. Empty arguments such as *PRO* and *REL* can be inserted deterministically; we present linguistically motivated rules that automatically insert these arguments with high accuracy. On the other hand, it is difficult to find deterministic rules to insert *GAP* and *pro*; for these arguments, we introduce a new annotation scheme that concurrently handles both semantic role labeling and empty category insertion, producing fast and high quality annotation. In addition, we present algorithms for finding antecedents of *REL* and *PRO*, and discuss why finding antecedents for some types of *PRO* is difficult.
In times of mass emergency, vast amounts of data are generated via computer-mediated communication (CMC) that are difficult to manually collect and organize into a coherent picture. Yet valuable information is broadcast, and can provide useful insight into time- and safety-critical situations if captured and analyzed efficiently and effectively. We describe a natural language processing component of the EPIC (Empowering the Public with Information in Crisis) Project infrastructure, designed to extract linguistic and behavioral information from tweet text to aid in the task of information integration. The system incorporates linguistic annotation, in the form of Named Entity Tagging, as well as behavioral annotations to capture tweets contributing to situational awareness and analyze the information type of the tweet content. We show classification results and describe future integration of these classifiers in the larger EPIC infrastructure.
This paper suggests a method for detecting cross-lingual semantic similarity using parallel PropBanks. We begin by improving word alignments for verb predicates generated by GIZA++ by using information available in parallel PropBanks. We applied the Kuhn-Munkres method to measure predicate-argument matching and improved verb predicate alignments by an F-score of 12.6%. Using the enhanced word alignments we checked the set of target verbs aligned to a specific source verb for semantic consistency. For a set of English verbs aligned to a Chinese verb, we checked if the English verbs belong to the same semantic class using an existing lexical database, WordNet. For a set of Chinese verbs aligned to an English verb we manually checked semantic similarity between the Chinese verbs within a set. Our results show that the verb sets we generated have a high correlation with semantic classes. This could potentially lead to an automatic technique for generating semantic classes for verbs.
This paper gives guidelines of how to create and update Propbank frameset files using a dedicated editor, Cornerstone. Propbank is a corpus in which the arguments of each verb predicate are annotated with their semantic roles in relation to the predicate. Propbank annotation also requires the choice of a sense ID for each predicate. Thus, for each predicate in Propbank, there exists a corresponding frameset file showing the expected predicate argument structure of each sense related to the predicate. Since most Propbank annotations are based on the predicate argument structure defined in the frameset files, it is important to keep the files consistent, simple to read as well as easy to update. The frameset files are written in XML, which can be difficult to edit when using a simple text editor. Therefore, it is helpful to develop a user-friendly editor such as Cornerstone, specifically customized to create and edit frameset files. Cornerstone runs platform independently, is light enough to run as an X11 application and supports multiple languages such as Arabic, Chinese, English, Hindi and Korean.
We are in the process of creating a multi-representational and multi-layered treebank for Hindi/Urdu (Palmer et al., 2009), which has three main layers: dependency structure, predicate-argument structure (PropBank), and phrase structure. This paper discusses an important issue in treebank design which is often neglected: the use of empty categories (ECs). All three levels of representation make use of ECs. We make a high-level distinction between two types of ECs, trace and silent, on the basis of whether they are postulated to mark displacement or not. Each type is further refined into several subtypes based on the underlying linguistic phenomena which the ECs are introduced to handle. This paper discusses the stages at which we add ECs to the Hindi/Urdu treebank and why. We investigate methodically the different types of ECs and their role in our syntactic and semantic representations. We also examine our decisions whether or not to coindex each type of ECs with other elements in the representation.
This paper gives guidelines of how to annotate Propbank instances using a dedicated editor, Jubilee. Propbank is a corpus in which the arguments of each verb predicate are annotated with their semantic roles in relation to the predicate. Propbank annotation also requires the choice of a sense ID for each predicate. Jubilee facilitates this annotation process by displaying several resources of syntactic and semantic information simultaneously: the syntactic structure of a sentence is displayed in the main frame, the available senses with their corresponding argument structures are displayed in another frame, all available Propbank arguments are displayed for the annotators choice, and example annotations of each sense of the predicate are available to the annotator for viewing. Easy access to each of these resources allows the annotator to quickly absorb and apply the necessary syntactic and semantic information pertinent to each predicate for consistent and efficient annotation. Jubilee has been successfully adapted to many Propbank projects in several universities. The tool runs platform independently, is light enough to run as an X11 application and supports multiple languages such as Arabic, Chinese, English, Hindi and Korean.
This study attempts to pinpoint the factors that restrict reliable word sense annotation, focusing on the influence of the number of senses annotators use and the semantic granularity of those senses. Both of these factors may be possible causes of low interannotator agreement (ITA) when tagging with fine-grained word senses, and, consequently, low WSD system performance (Ng et al., 1999; Snyder & Palmer, 2004; Chklovski & Mihalcea, 2002). If number of senses is the culprit, modifying the task to show fewer senses at a time could improve annotator reliability. However, if overly nuanced distinctions are the problem, then more general, coarse-grained distinctions may be necessary for annotator success and may be all that is needed to supply systems with the types of distinctions that people make. We describe three experiments that explore the role of these factors in annotation performance. Our results indicate that of these two factors, only the granularity of the senses restricts interannotator agreement, with broader senses resulting in higher annotation reliability.
LRs remain expensive to create and thus rare relative to demand across languages and technology types. The accidental re-creation of an LR that already exists is a nearly unforgivable waste of scarce resources that is unfortunately not so easy to avoid. The number of catalogs the HLT researcher must search, with their different formats, make it possible to overlook an existing resource. This paper sketches the sources of this problem and outlines a proposal to rectify along with a new vision of LR cataloging that will to facilitates the documentation and exploitation of a much wider range of LRs than previously considered.
This paper summarizes the annotation of fine-grained entailment relationships in the context of student answers to science assessment questions. We annotated a corpus of 15,357 answer pairs with 145,911 fine-grained entailment relationships. We provide the rationale for such fine-grained analysis and discuss its perceived benefits to an Intelligent Tutoring System. The corpus also has potential applications in other areas, such as question answering and multi-document summarization. Annotators achieved 86.2% inter-annotator agreement (Kappa=0.728, corresponding to substantial agreement) annotating the fine-grained facets of reference answers with regard to understanding expressed in student answers and labeling from one of five possible detailed relationship categories. The corpus described in this paper, which is the only one providing such detailed entailment annotations, is available as a public resource for the research community. The corpus is expected to enable application development, not only for intelligent tutoring systems, but also for general textual entailment applications, that is currently not practical.
In this paper, we present the details of creating a pilot Arabic proposition bank (Propbank). Propbanks exist for both English and Chinese. However the morphological and syntactic expression of linguistic phenomena in Arabic yields a very different type of process in creating an Arabic propbank. Hence, we highlight those characteristics of Arabic that make creating a propbank for the language a different challenge compared to the creation of an English Propbank.We believe that many of the lessons learned in dealing with Arabic could generalise to other languages that exhibit equally rich morphology and relatively free word order.
Lexical classifications have proved useful in supporting various natural language processing (NLP) tasks. The largest verb classification for English is Levin's (1993) work which defined groupings of verbs based on syntactic properties. VerbNet - the largest computational verb lexicon currently available for English - provides detailed syntactic-semantic descriptions of Levin classes. While the classes included are extensive enough for some NLP use, they are not comprehensive. Korhonen and Briscoe (2004) have proposed a significant extension of Levin's classification which incorporates 57 novel classes for verbs not covered (comprehensively) by Levin. This paper describes the integration of these classes into VerbNet. The result is the most extensive Levin-style classification for English verbs which can be highly useful for practical applications.
As an approach to syntax based statistical machine translation (SMT), Probabilistic Synchronous Dependency Insertion Grammars (PSDIG), introduced in (Ding and Palmer, 2005), are a version of synchronous grammars defined on dependency trees. In this paper we discuss better learning and decoding algorithms for a PSDIG MT system. We introduce two new grammar learners: (1) an exhaustive learner combining different heuristics, (2) an n-gram based grammar learner. Combining the grammar rules learned from the two learners improved the performance. We introduce a better decoding algorithm which incorporates a tri-gram language model. According to the Bleu metric, the PSDIG MT system performance is significantly better than IBM Model 4, while on par with the state-of-the-art phrase based system Pharaoh (Koehn, 2004). The improved integration of syntax on both source and target languages opens door to more sophisticated SMT processes.
Structural divergence presents a challenge to the use of syntax in statistical machine translation. We address this problem with a new algorithm for alignment of loosely matched non-isomorphic dependency trees. The algorithm selectively relaxes the constraints of the two tree structures while keeping computational complexity polynomial in the length of the sentences. Experimentation with a large Chinese-English corpus shows an improvement in alignment results over the unstructured models of (Brown et al., 1993).
This paper describes an approach for handling structural divergences and recovering dropped arguments in an implemented Korean to English machine translation system. The approach relies on canonical predicate-argument structures (or dependency structures), which provide a suitable pivot representation for the handling of structural divergences and the recovery of dropped arguments. It can also be converted to and from the interface representations of many off-the-shelf parsers and generators.
Research in computational linguistics, computer graphics and autonomous agents has led to the development of increasingly sophisticated communicative agents over the past few years, bringing new perspective to machine translation research. The engineering of language- based smooth, expressive, natural-looking human gestures can give us useful insights into the design principles that have evolved in natural communication between people. In this paper we prototype a machine translation system from English to American Sign Language (ASL), taking into account not only linguistic but also visual and spatial information associated with ASL signs.
This paper reports on an experiment in assembling a domain-specific machine translation prototype system from off-the-shelf components. The design goals of this experiment were to reuse existing components, to use machine-learning techniques for parser specialization and for transfer lexicon extraction, and to use an expressive, lexicalized formalism for the transfer component.