2023
pdf
abs
UINAUIL: A Unified Benchmark for Italian Natural Language Understanding
Valerio Basile
|
Livio Bioglio
|
Alessio Bosca
|
Cristina Bosco
|
Viviana Patti
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
This paper introduces the Unified Interactive Natural Understanding of the Italian Language (UINAUIL), a benchmark of six tasks for Italian Natural Language Understanding. We present a description of the tasks and software library that collects the data from the European Language Grid, harmonizes the data format, and exposes functionalities to facilitates data manipulation and the evaluation of custom models. We also present the results of tests conducted with available Italian and multilingual language models on UINAUIL, providing an updated picture of the current state of the art in Italian NLU.
2014
pdf
abs
Modeling, Managing, Exposing, and Linking Ontologies with a Wiki-based Tool
Mauro Dragoni
|
Alessio Bosca
|
Matteo Casu
|
Andi Rexha
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In the last decade, the need of having effective and useful tools for the creation and the management of linguistic resources significantly increased. One of the main reasons is the necessity of building linguistic resources (LRs) that, besides the goal of expressing effectively the domain that users want to model, may be exploited in several ways. In this paper we present a wiki-based collaborative tool for modeling ontologies, and more in general any kind of linguistic resources, called MoKi. This tool has been customized in the context of an EU-funded project for addressing three important aspects of LRs modeling: (i) the exposure of the created LRs, (ii) for providing features for linking the created resources to external ones, and (iii) for producing multilingual LRs in a safe manner.
pdf
abs
A Gold Standard for CLIR evaluation in the Organic Agriculture Domain
Alessio Bosca
|
Matteo Casu
|
Matteo Dragoni
|
Nikolaos Marianos
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
We present a gold standard for the evaluation of Cross Language Information Retrieval systems in the domain of Organic Agriculture and AgroEcology. The presented resource is free to use for research purposes and it includes a collection of multilingual documents annotated with respect to a domain ontology, the ontology used for annotating the resources, a set of 48 queries in 12 languages and a gold standard with the correct resources for the proposed queries. The goal of this work consists in contributing to the research community with a resource for evaluating multilingual retrieval algorithms, with particular focus on domain adaptation strategies for general purpose multilingual information retrieval systems and on the effective exploitation of semantic annotations. Domain adaptation is in fact an important activity for tuning the retrieval system, reducing the ambiguities and improving the precision of information retrieval. Domain ontologies constitute a diffuse practice for defining the conceptual space of a corpus and mapping resources to specific topics and in our lab we propose as well to investigate and evaluate the impact of this information in enhancing the retrieval of contents. An initial experiment is described, giving a baseline for further research with the proposed gold standard.
pdf
A Lightweight Terminology Verification Service for External Machine Translation Engines
Alessio Bosca
|
Vassilina Nikoulina
|
Marc Dymetman
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
2013
pdf
Celi: EDITS and Generic Text Pair Classification
Milen Kouylekov
|
Luca Dini
|
Alessio Bosca
|
Marco Trevisan
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)
2012
pdf
abs
Linguagrid: a network of Linguistic and Semantic Services for the Italian Language.
Alessio Bosca
|
Luca Dini
|
Milen Kouylekov
|
Marco Trevisan
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
In order to handle the increasing amount of textual information today available on the web and exploit the knowledge latent in this mass of unstructured data, a wide variety of linguistic knowledge and resources (Language Identification, Morphological Analysis, Entity Extraction, etc.). is crucial. In the last decade LRaas (Language Resource as a Service) emerged as a novel paradigm for publishing and sharing these heterogeneous software resources over the Web. In this paper we present an overview of Linguagrid, a recent initiative that implements an open network of linguistic and semantic Web Services for the Italian language, as well as a new approach for enabling customizable corpus-based linguistic services on Linguagrid LRaaS infrastructure. A corpus ingestion service in fact allows users to upload corpora of documents and to generate classification/clustering models tailored to their needs by means of standard machine learning techniques applied to the textual contents and metadata from the corpora. The models so generated can then be accessed through proper Web Services and exploited to process and classify new textual contents.
pdf
CELI: An Experiment with Cross Language Textual Entailment
Milen Kouylekov
|
Luca Dini
|
Alessio Bosca
|
Marco Trevisan
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)