2020
pdf
abs
The Annotation of Thematic Structure and Alternations face to the Semantic Variation of Action Verbs. Current Trends in the IMAGACT Ontology
Massimo Moneglia
|
Rossella Varvara
16th Joint ACL - ISO Workshop on Interoperable Semantic Annotation PROCEEDINGS
We present some issues in the development of the semantic annotation of IMAGACT, a multimodal and multilingual ontology of actions. The resource is structured on action concepts that are meant to be cognitive entities and to which a linguistic caption is attached. For each of these concepts, we annotate the minimal thematic structure of the caption and the possible argument alternations allowed. We present some insights on this process with regards to the notion of thematic structure and the relationship between action concepts and linguistic expressions. From the empirical evidence provided by the annotation, we discuss on the very nature of thematic structure, arguing that it is neither a property of the verb itself nor a property of action concepts. We further show what is the relation between thematic structure and 1- the semantic variation of action verbs; 2- the lexical variation of action concepts.
2014
pdf
abs
The IMAGACT Visual Ontology. An Extendable Multilingual Infrastructure for the representation of lexical encoding of Action
Massimo Moneglia
|
Susan Brown
|
Francesca Frontini
|
Gloria Gagliardi
|
Fahad Khan
|
Monica Monachini
|
Alessandro Panunzi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Action verbs have many meanings, covering actions in different ontological types. Moreover, each language categorizes action in its own way. One verb can refer to many different actions and one action can be identified by more than one verb. The range of variations within and across languages is largely unknown, causing trouble for natural language processing tasks. IMAGACT is a corpus-based ontology of action concepts, derived from English and Italian spontaneous speech corpora, which makes use of the universal language of images to identify the different action types extended by verbs referring to action in English, Italian, Chinese and Spanish. This paper presents the infrastructure and the various linguistic information the user can derive from it. IMAGACT makes explicit the variation of meaning of action verbs within one language and allows comparisons of verb variations within and across languages. Because the action concepts are represented with videos, extension into new languages beyond those presently implemented in IMAGACT is done using competence-based judgments by mother-tongue informants without intense lexicographic work involving underdetermined semantic description
2012
pdf
abs
The IMAGACT Cross-linguistic Ontology of Action. A new infrastructure for natural language disambiguation
Massimo Moneglia
|
Monica Monachini
|
Omar Calabrese
|
Alessandro Panunzi
|
Francesca Frontini
|
Gloria Gagliardi
|
Irene Russo
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Action verbs, which are highly frequent in speech, cause disambiguation problems that are relevant to Language Technologies. This is a consequence of the peculiar way each natural language categorizes Action i.e. it is a consequence of semantic factors. Action verbs are frequently general, since they extend productively to actions belonging to different ontological types. Moreover, each language categorizes action in its own way and therefore the cross-linguistic reference to everyday activities is puzzling. This paper briefly sketches the IMAGACT project, which aims at setting up a cross-linguistic Ontology of Action for grounding disambiguation tasks in this crucial area of the lexicon. The project derives information on the actual variation of action verbs in English and Italian from spontaneous speech corpora, where references to action are high in frequency. Crucially it makes use of the universal language of images to identify action types, avoiding the underdeterminacy of semantic definitions. Action concept entries are implemented as prototypic scenes; this will make it easier to extend the Ontology to other languages.
pdf
abs
RIDIRE-CPI: an Open Source Crawling and Processing Infrastructure for Supervised Web-Corpora Building
Alessandro Panunzi
|
Marco Fabbri
|
Massimo Moneglia
|
Lorenzo Gregori
|
Samuele Paladini
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper introduces the RIDIRE-CPI, an open source tool for the building of web corpora with a specific design through a targeted crawling strategy. The tool has been developed within the RIDIRE Project, which aims at creating a 2 billion word balanced web corpus for Italian. RIDIRE-CPI architecture integrates existing open source tools as well as modules developed specifically within the RIDIRE project. It consists of various components: a robust crawler (Heritrix), a user friendly web interface, several conversion and cleaning tools, an anti-duplicate filter, a language guesser, and a PoS tagger. The RIDIRE-CPI user-friendly interface is specifically intended for allowing collaborative work performance by users with low skills in web technology and text processing. Moreover, RIDIRE-CPI integrates a validation interface dedicated to the evaluation of the targeted crawling. Through the content selection, metadata assignment, and validation procedures, the RIDIRE-CPI allows the gathering of linguistic data with a supervised strategy that leads to a higher level of control of the corpus contents. The modular architecture of the infrastructure and its open-source distribution will assure the reusability of the tool for other corpus building initiatives.
2006
pdf
abs
Integrating Methods and LRs for Automatic Keyword Extraction from Open Domain Texts
Alessandro Panunzi
|
Marco Fabbri
|
Massimo Moneglia
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The paper presents a tool for keyword extraction from multilingual resources developed within the AXMEDIS project. In this tool lexical collocations (Sinclair, 1991) internal to documents are used to enhance the performance obtained through standard statistical procedure. A first set of mono-term keywords is extracted through the TF.IDF algorithm (Salton, 1989). The internal analysis of the document generates a second set of multi-term keywords based on the first set, rather than on multi-term frequency comparison with a general resource (Witten et al. 1999). Collocations in which a mono-term keyword occurs as the head are considered as multi-term keywords, and are assumed to increase the identification of the content. The evaluation compares the results of the TF.IDF procedure and the ones obtained with the enhanced procedure in terms of precision. Each set of keywords received a value from the point of view of a possible user, regarding: (a) overall efficiency of the whole set of keywords for the identification of the content; (b) adequacy of each extracted keyword. Results show that multi-term keywords increase the content identification with a 100% relative factor and that the adequacy is enhanced in 33% of cases.
2004
pdf
Using PiTagger for Lemmatization and PoS Tagging of a Spontaneous Speech Corpus: C-Oral-Rom Italian
Alessandro Panunzi
|
Eugenio Picchi
|
Massimo Moneglia
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
pdf
Measurements of Spoken Language Variability in a Multilingual Corpus. Predictable Aspects
Massimo Moneglia
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
pdf
Evaluation of Consensus on the Annotation of Prosodic Breaks in the Romance Corpus of Spontaneous Speech “C-ORAL-ROM”
Morena Danieli
|
Juan María Garrido
|
Massimo Moneglia
|
Andrea Panizza
|
Silvia Quazza
|
Marc Swerts
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
2002
pdf
The C-ORAL-ROM Project. New methods for spoken language archives in a multilingual romance corpus
Emanuela Cresti
|
Massimo Moneglia
|
Fernanda Bacelar do Nascimento
|
Antonio Moreno Sandoval
|
Jean Veronis
|
Philippe Martin
|
Kalid Choukri
|
Valerie Mapelli
|
Daniele Falavigna
|
Antonio Cid
|
Claude Blum
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)