Valeria Quochi


2018

pdf bib
The DLDP Survey on Digital Use and Usability of EU Regional and Minority Languages
Claudia Soria | Valeria Quochi | Irene Russo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Fostering digital representation of EU regional and minority languages: the Digital Language Diversity Project
Claudia Soria | Irene Russo | Valeria Quochi | Davyth Hicks | Antton Gurrutxaga | Anneli Sarhimaa | Matti Tuomisto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Poor digital representation of minority languages further prevents their usability on digital media and devices. The Digital Language Diversity Project, a three-year project funded under the Erasmus+ programme, aims at addressing the problem of low digital representation of EU regional and minority languages by giving their speakers the intellectual an practical skills to create, share, and reuse online digital content. Availability of digital content and technical support to use it are essential prerequisites for the development of language-based digital applications, which in turn can boost digital usage of these languages. In this paper we introduce the project, its aims, objectives and current activities for sustaining digital usability of minority languages through adult education.

2014

pdf bib
Polysemy Index for Nouns: an Experiment on Italian using the PAROLE SIMPLE CLIPS Lexical Database
Francesca Frontini | Valeria Quochi | Sebastian Padó | Monica Monachini | Jason Utt
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

An experiment is presented to induce a set of polysemous basic type alternations (such as Animal-Food, or Building-Institution) by deriving them from the sense alternations found in an existing lexical resource. The paper builds on previous work and applies those results to the Italian lexicon PAROLE SIMPLE CLIPS. The new results show how the set of frequent type alternations that can be induced from the lexicon is partly different from the set of polysemy relations selected and explicitely applied by lexicographers when building it. The analysis of mismatches shows that frequent type alternations do not always correpond to prototypical polysemy relations, nevertheless the proposed methodology represents a useful tool offered to lexicographers to systematically check for possible gaps in their resource.

pdf bib
From Synsets to Videos: Enriching ItalWordNet Multimodally
Roberto Bartolini | Valeria Quochi | Irene De Felice | Irene Russo | Monica Monachini
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The paper describes the multimodal enrichment of ItalWordNet action verbs’ entries by means of an automatic mapping with an ontology of action types instantiated by video scenes (ImagAct). The two resources present important differences as well as interesting complementary features, such that a mapping of these two resources can lead to a an enrichment of IWN, through the connection between synsets and videos apt to illustrate the meaning described by glosses. Here, we describe an approach inspired by ontology matching methods for the automatic mapping of ImagAct video scened onto ItalWordNet sense. The experiments described in the paper are conducted on Italian, but the same methodology can be extended to other languages for which WordNets have been created, since ImagAct is done also for English, Chinese and Spanish. This source of multimodal information can be exploited to design second language learning tools, as well as for language grounding in video action recognition and potentially for robotics.

2013

pdf bib
Generative Lexicon Theory and Linguistic Linked Open Data
Fahad Khan | Francesca Frontini | Riccardo Del Gratta | Monica Monachini | Valeria Quochi
Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013)

2012

pdf bib
A MWE Acquisition and Lexicon Builder Web Service
Valeria Quochi | Francesca Frontini | Francesco Rubino
Proceedings of COLING 2012

pdf bib
Customizable SCF Acquisition in Italian
Tommaso Caselli | Francesco Rubino | Francesca Frontini | Irene Russo | Valeria Quochi
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Lexica of predicate-argument structures constitute a useful tool for several tasks in NLP. This paper describes a web-service system for automatic acquisition of verb subcategorization frames (SCFs) from parsed data in Italian. The system acquires SCFs in an unsupervised manner. We created two gold standards for the evaluation of the system, the first by mixing together information from two lexica (one manually created and the second automatically acquired) and manual exploration of corpus data and the other annotating data extracted from a specialized corpus (environmental domain). Data filtering is accomplished by means of the maximum likelihood estimate (MLE). The evaluation phase has allowed us to identify the best empirical MLE threshold for the creation of a lexicon (P=0.653, R=0.557, F1=0.601). In addition to this, we assigned to the extracted entries of the lexicon a confidence score based on the relative frequency and evaluated the extractor on domain specific data. The confidence score will allow the final user to easily select the entries of the lexicon in terms of their reliability: one of the most interesting feature of this work is the possibility the final users have to customize the results of the SCF extractor, obtaining different SCF lexica in terms of size and accuracy.

pdf bib
Towards a User-Friendly Platform for Building Language Resources based on Web Services
Marc Poch | Antonio Toral | Olivier Hamon | Valeria Quochi | Núria Bel
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents the platform developed in the PANACEA project, a distributed factory that automates the stages involved in the acquisition, production, updating and maintenance of Language Resources required by Machine Translation and other Language Technologies. We adopt a set of tools that have been successfully used in the Bioinformatics field, they are adapted to the needs of our field and used to deploy web services, which can be combined to build more complex processing chains (workflows). This paper describes the platform and its different components (web services, registry, workflows, social network and interoperability). We demonstrate the scalability of the platform by carrying out a set of massive data experiments. Finally, a validation of the platform across a set of required criteria proves its usability for different types of users (non-technical users and providers).

pdf bib
Integrating NLP Tools in a Distributed Environment: A Case Study Chaining a Tagger with a Dependency Parser
Francesco Rubino | Francesca Frontini | Valeria Quochi
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The present paper tackles the issue of PoS tag conversion within the framework of a distributed web service platform for the automatic creation of language resources. PoS tagging is now considered a """"solved problem""""; yet, because of the differences in the tagsets, interchange of the various PoS tagger available is still hampered. In this paper we describe the implementation of a pos-tagged-corpus converter, which is needed for chaining together in a workflow the Freeling PoS tagger for Italian and the DESR dependency parser, given that these two tools have been developed independently. The conversion problems experienced during the implementation, related to the properties of the different tagsets and of tagset conversion in general, are discussed together with the heuristics implemented in the attempt to solve them. Finally, the converter is evaluated by assessing the impact of conversion on the performance of the dependency parser. From this we learn that in most cases parsing errors are due to actual tagging errors, and not to conversion itself. Besides, information on accuracy loss is an important feature in a distributed environment of (NLP) services, where users need to decide which services best suit their needs.

pdf bib
The FLaReNet Strategic Language Resource Agenda
Claudia Soria | Núria Bel | Khalid Choukri | Joseph Mariani | Monica Monachini | Jan Odijk | Stelios Piperidis | Valeria Quochi | Nicoletta Calzolari
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The FLaReNet Strategic Agenda highlights the most pressing needs for the sector of Language Resources and Technologies and presents a set of recommendations for its development and progress in Europe, as issued from a three-year consultation of the FLaReNet European project. The FLaReNet recommendations are organised around nine dimensions: a) documentation b) interoperability c) availability, sharing and distribution d) coverage, quality and adequacy e) sustainability f) recognition g) development h) infrastructure and i) international cooperation. As such, they cover a broad range of topics and activities, spanning over production and use of language resources, licensing, maintenance and preservation issues, infrastructures for language resources, resource identification and sharing, evaluation and validation, interoperability and policy issues. The intended recipients belong to a large set of players and stakeholders in Language Resources and Technology, ranging from individuals to research and education institutions, to policy-makers, funding agencies, SMEs and large companies, service and media providers. The main goal of these recommendations is to serve as an instrument to support stakeholders in planning for and addressing the urgencies of the Language Resources and Technologies of the future.

2011

pdf bib
Interoperability Framework: The FLaReNet Action Plan Proposal
Nicoletta Calzolari | Monica Monachini | Valeria Quochi
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm

2010

pdf bib
The LREC Map of Language Resources and Technologies
Nicoletta Calzolari | Claudia Soria | Riccardo Del Gratta | Sara Goggi | Valeria Quochi | Irene Russo | Khalid Choukri | Joseph Mariani | Stelios Piperidis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we present the LREC Map of Language Resources and Tools, an innovative feature introduced with this LREC. The purpose of the Map is to shed light on the vast amount of resources and tools that represent the background of the research presented at LREC, in the attempt to fill in a gap in the community knowledge about the resources and tools that are used or created worldwide. It also aims at a change of culture in the field, actively engaging each researcher in the documentation task about resources. The Map has been developed on the basis of the information provided by LREC authors during the submission of papers to the LREC 2010 conference and the LREC workshops, and contains information about almost 2000 resources. The paper illustrates the motivation behind this initiative, its main characteristics, its relevance and future impact in the field, the metadata used to describe the resources, and finally presents some of the most relevant findings.

pdf bib
Capturing Coercions in Texts: a First Annotation Exercise
Elisabetta Jezek | Valeria Quochi
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we report the first results of an annotation exercise of argument coercion phenomena performed on Italian texts. Our corpus consists of ca 4000 sentences from the PAROLE sottoinsieme corpus (Bindi et al. 2000) annotated with Selection and Coercion relations among verb-noun pairs formatted in XML according to the Generative Lexicon Mark-up Language (GLML) format (Pustejovsky et al., 2008). For the purposes of coercion annotation, we selected 26 Italian verbs that impose semantic typing on their arguments in either Subject, Direct Object or Complement position. Every sentence of the corpus is annotated with the source type for the noun arguments by two annotators plus a judge. An overall agreement of 0.87 kappa indicates that the annotation methodology is reliable. A qualitative analysis of the results allows us to outline some suggestions for improvement of the task: 1) a different account of complex types for nouns has to be devised and 2) a more comprehensive account of coercion mechanisms requires annotation of the deeper meaning dimensions that are targeted in coercion operations, such as those captured by Qualia relations.

pdf bib
SemEval-2010 Task 7: Argument Selection and Coercion
James Pustejovsky | Anna Rumshisky | Alex Plotnick | Elisabetta Jezek | Olga Batiukova | Valeria Quochi
Proceedings of the 5th International Workshop on Semantic Evaluation

2008

pdf bib
A lexicon for biology and bioinformatics: the BOOTStrep experience.
Valeria Quochi | Monica Monachini | Riccardo Del Gratta | Nicoletta Calzolari
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and text mining in the biomedical domain. The BioLexicon is a large-scale lexical-terminological resource encoding different information types in one single integrated resource. In the design of the resource we follow the ISO/DIS 24613 “Lexical Mark-up Framework” standard, which ensures reusability of the information encoded and easy exchange of both data and architecture. The design of the resource also takes into account the needs of our text mining partners who automatically extract syntactic and semantic information from texts and feed it to the lexicon. The present contribution first describes in detail the model of the BioLexicon along its three main layers: morphology, syntax and semantics; then, it briefly describes the database implementation of the model and the population strategy followed within the project, together with an example. The BioLexicon database in fact comes equipped with automatic uploading procedures based on a common exchange XML format, which guarantees that the lexicon can be properly populated with data coming from different sources.

pdf bib
Learning properties of Noun Phrases: from data to functions
Valeria Quochi | Basilio Calderone
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The paper presents two experiments of unsupervised classification of Italian noun phrases. The goal of the experiments is to identify the most prominent contextual properties that allow for a functional classification of noun phrases. For this purpose, we used a Self Organizing Map is trained with syntactically-annotated contexts containing noun phrases. The contexts are defined by means of a set of features representing morpho-syntactic properties of both nouns and their wider contexts. Two types of experiments have been run: one based on noun types and the other based on noun tokens. The results of the type simulation show that when frequency is the most prominent classification factor, the network isolates idiomatic or fixed phrases. The results of the token simulation experiment, instead, show that, of the 36 attributes represented in the original input matrix, only a few of them are prominent in the re-organization of the map. In particular, key features in the emergent macro-classification are the type of determiner and the grammatical number of the noun. An additional but not less interesting result is an organization into semantic/pragmatic micro-classes. In conclusions, our result confirm the relative prominence of determiner type and grammatical number in the task of noun (phrase)categorization.

2007

pdf bib
Inferring the Semantics of Temporal Prepositions in Italian
Tommaso Caselli | Valeria Quochi
Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions

2004

pdf bib
Representing Italian Complex Nominals: A Pilot Study
Valeria Quochi
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)