Linguistic Issues in Language Technology, Volume 14, 2016 - Modality: Logic, Semantics, Annotation, and Machine Learning
Up to rather recently Natural Language Processing has not given much attention to modality. As long as the main task was to determined what a text was about (Information Retrieval) or who the participants in an eventuality were (Information Extraction), this neglect was understandable. With the focus moving to questions of natural language understanding and inferencing as well as to sentiment and opinion analysis, it becomes necessary to distinguish between actual and envisioned eventualities and to draw conclusions about the attitude of the writer or speaker towards the eventualities referred to. This means, i.a., to be able to distinguish ‘John went to Paris’ and ‘John wanted to go to Paris’. To do this one has to calculate the effect of different linguistic operators on the eventuality predication.
Classical theories of discourse semantics, such as Discourse Representation Theory (DRT), Dynamic Predicate Logic (DPL), predict that an indefinite noun phrase cannot serve as antecedent for an anaphor if the noun phrase is, but the anaphor is not, in the scope of a modal expression. However, this prediction meets with counterexamples. The phenomenon modal subordination is one of them. In general, modal subordination is concerned with more than two modalities, where the modality in subsequent sentences is interpreted in a context ‘subordinate’ to the one created by the first modal expression. In other words, subsequent sentences are interpreted as being conditional on the scenario introduced in the first sentence. One consequence is that the anaphoric potential of indefinites may extend beyond the standard limits of accessibility constraints. This paper aims to give a formal interpretation on modal subordination. The theoretical backbone of the current work is Type Theoretic Dynamic Logic (TTDL), which is a Montagovian account of discourse semantics. Different from other dynamic theories, TTDL was built on classical mathematical and logical tools, such as λ-calculus and Church’s theory of types. Hence it is completely compositional and does not suffer from the destructive assignment problem. We will review the basic set-up of TTDL and then present Kratzer’s theory on natural language modality. After that, by integrating the notion of conversation background, in particular, the modal base usage, we offer an extension of TTDL (called Modal-TTDL, or M-TTDL in short) which properly deals with anaphora across modality. The formal relation between Modal-TTDL and TTDL will be discussed as well. We uncover the difficulty of specific sense distinctions by investigating distributional bias and reducing the sparsity of existing small-scale corpora used in prior work. We build a semantically enriched model for modal sense classification by designing novel features related to lexical, proposition-level and discourse-level semantic factors. Besides improved classification performance, closer examination of interpretable feature sets unveils relevant semantic and contextual factors in modal sense classification. Finally, we investigate genre effects on modal sense distribution and how they affect classification performance. Our investigations uncover the difficulty of specific sense distinctions and how they are affected by training set size and distributional bias. Our large-scale experiments confirm that semantically enriched models outperform models built on shallow feature sets. Cross-genre experiments shed light on differences in sense distributions across genres and confirm that semantically enriched models have high generalization capacity, especially in unstable distributional settings.
Modal Sense Classification At Large: Paraphrase-Driven Sense Projection, Semantically Enriched Classification Models and Cross-Genre Evaluations
Ana Marasović | Mengfei Zhou | Alexis Palmer | Anette Frank
Modal verbs have different interpretations depending on their context. Their sense categories – epistemic, deontic and dynamic – provide important dimensions of meaning for the interpretation of discourse. Previous work on modal sense classification achieved relatively high performance using shallow lexical and syntactic features drawn from small-size annotated corpora. Due to the restricted empirical basis, it is difficult to assess the particular difficulties of modal sense classification and the generalization capacity of the proposed models. In this work we create large-scale, high-quality annotated corpora for modal sense classification using an automatic paraphrase-driven projection approach. Using the acquired corpora, we investigate the modal sense classification task from different perspectives.
In this paper we present current work on the design and validation of a linguistically-motivated annotation model of modality in English and Spanish in the context of the MULTINOT project. Our annotation model captures four basic modal meanings and their subtypes, on the one hand, and provides a fine-grained characterisation of the syntactic realisations of those meanings in English and Spanish, on the other. We validate the modal tagset proposed through an agreement study performed on a bilingual sample of four hundred sentences extracted from original texts of the MULTINOT corpus, and discuss the difficult cases encountered in the annotation experiment. We also describe current steps in the implementation of the proposed scheme for the large-scale annotation of the bilingual corpus using both automatic and manual procedures.
We investigate modality in Portuguese and we combine a linguistic perspective with an application-oriented perspective on modality. We design an annotation scheme reflecting theoretical linguistic concepts and apply this schema to a small corpus sample to show how the scheme deals with real world language usage. We present two schemas for Portuguese, one for spoken Brazilian Portuguese and one for written European Portuguese. Furthermore, we use the annotated data not only to study the linguistic phenomena of modality, but also to train a practical text mining tool to detect modality in text automatically. The modality tagger uses a machine learning classifier trained on automatically extracted features from a syntactic parser. As we only have a small annotated sample available, the tagger was evaluated on 11 modal verbs that are frequent in our corpus and that denote more than one modal meaning. Finally, we discuss several valuable insights into the complexity of the semantic concept of modality that derive from the process of manual annotation of the corpus and from the analysis of the results of the automatic labeling: ambiguity and the semantic and syntactic properties typically associated to one modal meaning in context, and also the interaction of modality with negation and focus. The knowledge gained from the manual annotation task leads us to propose a new unified scheme for modality that applies to the two Portuguese varieties and covers both written and spoken data.
Modal auxiliaries have different readings, depending on the context in which they occur (Kratzer, 1981). Several projects have attempted to classify uses of modal auxiliaries in corpora according to their reading using supervised machine learning techniques (e.g., Rubinstein et al., 2013, Ruppenhofer & Rehbein, 2012). In each study, traditional taxonomic labels, such as ‘epistemic’ and ‘deontic’ are used by human annotators to label instances of modal auxiliaries in a corpus. In order to achieve higher agreement among annotators, results in these previous studies are reported after collapsing some of the initial categories. The results show that human annotators have fairly good agreement on some of the categories, such as whether or not a use is epistemic, but poor agreement on others. They also show that annotators agree more on modals such as might than on modals such as could. In this study, we used traditional taxonomic categories on sentences containing modal auxiliary verbs that were randomly extracted from the English Gigaword 4th edition corpus (Parker et al., 2009). The lowest inner-annotator agreement using traditional taxonomic labels occurred with uses of could, with raw agreements of 42%−48% (κ = 0.196−0.259), compared to might, for instance, with raw agreement of 98%. In response to the low numbers, rather than collapsing traditional categories, we tried a new method of classifying uses of could with respect to where the reading situates the eventuality being described relative to the speech time. For example, the sentence ‘Jess could swim.’ is about a swimming eventuality in the past leading up to the time of speech, if it is read as being an ability. The sentence is about a swimming eventuality in the future, if it is read as being a statement about a possibility. The classification labels we propose are crucial in separating uses of could that have actuality inferences (Bhatt, 1999, Hacquard, 2006) from uses that do not. For the temporal location of the event described by a use of could, using four category labels, we achieved 73%−90% raw agreement (κ = 0.614−0.744). Sequence of tense contexts (Abusch, 1997) present a major factor in the difficulty of determining the temporal properties present in uses of could. Among three annotators, we achieved raw agreement scores of 89%−96%(κ =0.779−0.919%) on identification of sequence of tense contexts. We discuss the role of our findings with respect to textual entailment.
Verbal irony, or sarcasm, presents a significant technical and conceptual challenge when it comes to automatic detection. Moreover, it can be a disruptive factor in sentiment analysis and opinion mining, because it changes the polarity of a message implicitly. Extant methods for automatic detection are mostly based on overt clues to ironic intent such as hashtags, also known as irony markers. In this paper, we investigate whether people who know each other make use of irony markers less often than people who do not know each other. We trained a machinelearning classifier to detect sarcasm in Twitter messages (tweets) that were addressed to specific users, and in tweets that were not addressed to a particular user. Human coders analyzed the top-1000 features found to be most discriminative into ten categories of irony markers. The classifier was also tested within and across the two categories. We find that tweets with a user mention contain fewer irony markers than tweets not addressed to a particular user. Classification experiments confirm that the irony in the two types of tweets is signaled differently. The within-category performance of the classifier is about 91% for both categories, while cross-category experiments yield substantially lower generalization performance scores of 75% and 71%. We conclude that irony markers are used more often when there is less mutual knowledge between sender and receiver. Senders addressing other Twitter users less often use irony markers, relying on mutual knowledge which should lead the receiver to infer ironic intent from more implicit clues. With regard to automatic detection, we conclude that our classifier is able to detect ironic tweets addressed at another user as reliably as tweets that are not addressed at at a particular person.