2024
pdf
abs
What to Annotate: Retrieving Lexical Markers of Conspiracy Discourse from an Italian-English Corpus of Telegram Data
Costanza Marini
|
Elisabetta Jezek
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024
In this age of social media, Conspiracy Theories (CTs) have become an issue that can no longer be ignored. After providing an overview of CT literature and corpus studies, we describe the creation of a 40,000-token English-Italian bilingual corpus of conspiracy-oriented Telegram comments – the Complotto corpus – and the linguistic analysis we performed using the Sketch Engine online platform (Kilgarriff et al., 2010) on our annotated data to identify statistically relevant linguistic markers of CT discourse. Thanks to the platform’s keywords and key terms extraction functions, we were able to assess the statistical significance of the following lexical and semantic phenomena, both cross-linguistically and cross-CT, namely: (1) evidentiality and epistemic modality markers; (2) debunking vocabulary referring to another version of the truth lying behind the official one; (3) the conceptual metaphor INSTITUTIONS ARE ABUSERS. All these features qualify as markers of CT discourse and have the potential to be effectively used for future semantic annotation tasks to develop automatic systems for CT identification.
2023
pdf
abs
Why Don’t You Do It Right? Analysing Annotators’ Disagreement in Subjective Tasks
Marta Sandri
|
Elisa Leonardelli
|
Sara Tonelli
|
Elisabetta Jezek
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Annotators’ disagreement in linguistic data has been recently the focus of multiple initiatives aimed at raising awareness on issues related to ‘majority voting’ when aggregating diverging annotations. Disagreement can indeed reflect different aspects of linguistic annotation, from annotators’ subjectivity to sloppiness or lack of enough context to interpret a text. In this work we first propose a taxonomy of possible reasons leading to annotators’ disagreement in subjective tasks. Then, we manually label part of a Twitter dataset for offensive language detection in English following this taxonomy, identifying how the different categories are distributed. Finally we run a set of experiments aimed at assessing the impact of the different types of disagreement on classification performance. In particular, we investigate how accurately tweets belonging to different categories of disagreement can be classified as offensive or not, and how injecting data with different types of disagreement in the training set affects performance. We also perform offensive language detection as a multi-task framework, using disagreement classification as an auxiliary task.
pdf
abs
Identifying Semantic Argument Types in Predication and Copredication Contexts: A Zero-Shot Cross-Lingual Approach
Deniz Ekin Yavas
|
Laura Kallmeyer
|
Rainer Osswald
|
Elisabetta Jezek
|
Marta Ricchiardi
|
Long Chen
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Identifying semantic argument types in predication contexts is not a straightforward task for several reasons, such as inherent polysemy, coercion, and copredication phenomena. In this paper, we train monolingual and multilingual classifiers with a zero-shot cross-lingual approach to identify semantic argument types in predications using pre-trained language models as feature extractors. We train classifiers for different semantic argument types and for both verbal and adjectival predications. Furthermore, we propose a method to detect copredication using these classifiers through identifying the argument semantic type targeted in different predications over the same noun in a sentence. We evaluate the performance of the method on copredication test data with Food•Event nouns for 5 languages.
2022
pdf
abs
Annotating Propositional Attitude Verbs and their Arguments
Marta Ricchiardi
|
Elisabetta Jezek
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022
This paper describes the results of an empirical study on attitude verbs and propositional attitude reports in Italian. Within the framework of a project aiming at acquiring argument structures for Italian verbs from corpora, we carried out a systematic annotation that aims at individuating which verbs are actually attitude verbs in Italian. The result is a list of 179 argument structures based on corpus-derived pattern of use for 126 verbs that behave as attitude verbs. The distribution of these verbs in the corpus suggests that not only the canonical that-clauses, i.e. subordinates introduced by the complementizerte che, but also direct speech, infinitives introduced by the complementizer di, and some nominals are good candidates to express propositional contents in propositional attitude reports. The annotation also enlightens some issues between semantics and ontology, concerning the relation between events and propositions.
2020
pdf
abs
Annotating Croatian Semantic Type Coercions in CROATPAS
Costanza Marini
|
Elisabetta Jezek
Proceedings of the 16th Joint ACL-ISO Workshop on Interoperable Semantic Annotation
This short research paper presents the results of a corpus-based metonymy annotation exercise on a sample of 101 Croatian verb entries – corresponding to 457 patters and over 20,000 corpus lines – taken from CROATPAS (Marini & Ježek, 2019), a digital repository of verb argument structures manually annotated with Semantic Type labels on their argument slots following a methodology inspired by Corpus Pattern Analysis (Hanks, 2004 & 2013; Hanks & Pustejovsky, 2005). CROATPAS will be made available online in 2020. Semantic Type labelling is not only well-suited to annotate verbal polysemy, but also metonymic shifts in verb argument combinations, which in Generative Lexicon (Pustejovsky, 1995 & 1998; Pustejovsky & Ježek, 2008) are called Semantic Type coercions. From a sub lexical point of view, Semantic Type coercions can be considered as exploitations of one of the qualia roles of those Semantic Types which do not satisfy a verb’s selectional requirements, but do not trigger a different verb sense. Overall, we were able to identify 62 different Semantic Type coercions linked to 1,052 metonymic corpus lines. In the future, we plan to compare our results with those from an equivalent study on Italian verbs (Romani, 2020) for a crosslinguistic analysis of metonymic shifts.
2019
pdf
bib
abs
A Distributional Model of Affordances in Semantic Type Coercion
Stephen McGregor
|
Elisabetta Jezek
Proceedings of the 13th International Conference on Computational Semantics - Short Papers
We explore a novel application for interpreting semantic type coercions, motivated by insight into the role that perceptual affordances play in the selection of the semantic roles of artefactual nouns which are observed as arguments for verbs which would stereotypically select for objects of a different type. In order to simulate affordances, which we take to be direct perceptions of context-specific opportunities for action, we preform a distributional analysis dependency relationships between target words and their modifiers and adjuncts. We use these relationships as the basis for generating on-line transformations which project semantic subspaces in which the interpretations of coercive compositions are expected to emerge as salient word-vectors. We offer some preliminary examples of how this model operates on a dataset of sentences involving coercive interactions between verbs and objects specifically designed to evaluate this work.
2018
pdf
Enriching a Lexicon of Discourse Connectives with Corpus-based Data
Anna Feltracco
|
Elisabetta Jezek
|
Bernardo Magnini
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2017
pdf
abs
Dynamic Argument Structure
Elisabetta Jezek
Linguistic Issues in Language Technology, Volume 15, 2017
This paper presents a new classification of verbs of change and modification, proposing a dynamic interpretation of the lexical semantics of the predicate and its arguments. Adopting the model of dynamic event structure proposed in Pustejovsky (2013), and extending the model of dynamic selection outlined in Pustejovsky and Jezek (2011), we define a verb class in terms of its Dynamic Argument Structure (DAS), a representation which encodes how the participants involved in the change behave as the event unfolds. We address how the logical resources and results of change predicates are realized syntactically, if at all, as well as how the exploitation of the resource results in the initiation or termination of a new object, i.e. the result. We show how DAS can be associated with a dynamically encoded event structure representation, which measures the change making reference to a scalar component, modelled in terms of assignment and/or testing of values of attributes of participants.
pdf
A Geometric Method for Detecting Semantic Coercion
Stephen McGregor
|
Elisabetta Jezek
|
Matthew Purver
|
Geraint Wiggins
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Long papers
2016
pdf
abs
Using WordNet to Build Lexical Sets for Italian Verbs
Anna Feltracco
|
Lorenzo Gatti
|
Elisabetta Jezek
|
Bernardo Magnini
|
Simone Magnolini
Proceedings of the 8th Global WordNet Conference (GWC)
We present a methodology for building lexical sets for argument slots of Italian verbs. We start from an inventory of semantically typed Italian verb frames and through a mapping to WordNet we automatically annotate the sets of fillers for the argument positions in a corpus of sentences. We evaluate both a baseline algorithm and a syntax driven algorithm and show that the latter performs significantly better in terms of precision.
pdf
abs
Acquiring Opposition Relations among Italian Verb Senses using Crowdsourcing
Anna Feltracco
|
Simone Magnolini
|
Elisabetta Jezek
|
Bernardo Magnini
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
We describe an experiment for the acquisition of opposition relations among Italian verb senses, based on a crowdsourcing methodology. The goal of the experiment is to discuss whether the types of opposition we distinguish (i.e. complementarity, antonymy, converseness and reversiveness) are actually perceived by the crowd. In particular, we collect data for Italian by using the crowdsourcing platform CrowdFlower. We ask annotators to judge the type of opposition existing among pairs of sentences -previously judged as opposite- that differ only for a verb: the verb in the first sentence is opposite of the verb in second sentence. Data corroborate the hypothesis that some opposition relations exclude each other, while others interact, being recognized as compatible by the contributors.
2015
pdf
Instrument subjects without Instrument role
Elisabetta Ježek
|
Rossella Varvara
Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-11)
pdf
Opposition Relations among Verb Frames
Anna Feltracco
|
Elisabetta Jezek
|
Bernardo Magnini
Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation
pdf
Corpus Patterns for Semantic Processing
Octavian Popescu
|
Patrick Hanks
|
Elisabetta Jezek
|
Daisuke Kawahara
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing: Tutorial Abstracts
2014
pdf
abs
T-PAS; A resource of Typed Predicate Argument Structures for linguistic analysis and semantic processing
Elisabetta Jezek
|
Bernardo Magnini
|
Anna Feltracco
|
Alessia Bianchini
|
Octavian Popescu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The goal of this paper is to introduce T-PAS, a resource of typed predicate argument structures for Italian, acquired from corpora by manual clustering of distributional information about Italian verbs, to be used for linguistic analysis and semantic processing tasks. T-PAS is the first resource for Italian in which semantic selection properties and sense-in-context distinctions of verbs are characterized fully on empirical ground. In the paper, we first describe the process of pattern acquisition and corpus annotation (section 2) and its ongoing evaluation (section 3). We then demonstrate the benefits of pattern tagging for NLP purposes (section 4), and discuss current effort to improve the annotation of the corpus (section 5). We conclude by reporting on ongoing experiments using semiautomatic techniques for extending coverage (section 6).
2013
pdf
Sweetening Ontologies cont’d
Elisabetta Jezek
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora
2012
pdf
abs
Annotating Qualia Relations in Italian and French Complex Nominals
Pierrette Bouillon
|
Elisabetta Jezek
|
Chiara Melloni
|
Aurélie Picton
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The goal of this paper is to provide an annotation scheme for compounds based on generative lexicon theory (GL, Pustejovsky, 1995; Bassac and Bouillon, 2001). This scheme has been tested on a set of compounds automatically extracted from the Europarl corpus (Koehn, 2005) both in Italian and French. The motivation is twofold. On the one hand, it should help refine existing compound classifications and better explain lexicalization in both languages. On the other hand, we hope that the extracted generalizations can be used in NLP, for example for improving MT systems or for query reformulation (Claveau, 2003). In this paper, we focus on the annotation scheme and its on going evaluation.
2011
pdf
Senso Comune, an Open Knowledge Base of Italian Language
Guido Vetere
|
Alessandro Oltramari
|
Isabella Chiari
|
Elisabetta Jezek
|
Laure Vieu
|
Fabio Massimo Zanzotto
Traitement Automatique des Langues, Volume 52, Numéro 3 : Ressources linguistiques libres [Free Language Resources]
2010
pdf
abs
Capturing Coercions in Texts: a First Annotation Exercise
Elisabetta Jezek
|
Valeria Quochi
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
In this paper we report the first results of an annotation exercise of argument coercion phenomena performed on Italian texts. Our corpus consists of ca 4000 sentences from the PAROLE sottoinsieme corpus (Bindi et al. 2000) annotated with Selection and Coercion relations among verb-noun pairs formatted in XML according to the Generative Lexicon Mark-up Language (GLML) format (Pustejovsky et al., 2008). For the purposes of coercion annotation, we selected 26 Italian verbs that impose semantic typing on their arguments in either Subject, Direct Object or Complement position. Every sentence of the corpus is annotated with the source type for the noun arguments by two annotators plus a judge. An overall agreement of 0.87 kappa indicates that the annotation methodology is reliable. A qualitative analysis of the results allows us to outline some suggestions for improvement of the task: 1) a different account of complex types for nouns has to be devised and 2) a more comprehensive account of coercion mechanisms requires annotation of the deeper meaning dimensions that are targeted in coercion operations, such as those captured by Qualia relations.
pdf
SemEval-2010 Task 7: Argument Selection and Coercion
James Pustejovsky
|
Anna Rumshisky
|
Alex Plotnick
|
Elisabetta Jezek
|
Olga Batiukova
|
Valeria Quochi
Proceedings of the 5th International Workshop on Semantic Evaluation