Luigi Di Caro

Also published as: Luigi di Caro

2020

This paper is concerned with the goal of maintaining legal information and compliance systems: the ‘resource consumption bottleneck’ of creating semantic technologies manually. The use of automated information extraction techniques could significantly reduce this bottleneck. The research question of this paper is: How to address the resource bottleneck problem of creating specialist knowledge management systems? In particular, how to semi-automate the extraction of norms and their elements to populate legal ontologies? This paper shows that the acquisition paradox can be addressed by combining state-of-the-art general-purpose NLP modules with pre- and post-processing using rules based on domain knowledge. It describes a Semantic Role Labeling based information extraction system to extract norms from legislation and represent them as structured norms in legal ontologies. The output is intended to help make laws more accessible, understandable, and searchable in legal document management systems such as Eunomos (Boella et al., 2016).

pdf abs
Building Semantic Grams of Human Knowledge
Valentina Leone | Giovanni Siragusa | Luigi Di Caro | Roberto Navigli
Proceedings of the Twelfth Language Resources and Evaluation Conference

Word senses are typically defined with textual definitions for human consumption and, in computational lexicons, put in context via lexical-semantic relations such as synonymy, antonymy, hypernymy, etc. In this paper we embrace a radically different paradigm that provides a slot-filler structure, called “semagram”, to define the meaning of words in terms of their prototypical semantic information. We propose a semagram-based knowledge model composed of 26 semantic relationships which integrates features from a range of different sources, such as computational lexicons and property norms. We describe an annotation exercise regarding 50 concepts over 10 different categories and put forward different automated approaches for extending the semagram base to thousands of concepts. We finally evaluated the impact of the proposed resource on a semantic similarity task, showing significant improvements over state-of-the-art word embeddings.

2019

pdf abs
Real Life Application of a Question Answering System Using BERT Language Model
Francesca Alloatti | Luigi Di Caro | Gianpiero Sportelli
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

It is often hard to apply the newest advances in research to real life scenarios. They usually require the resolution of some specific task applied to a restricted domain, all the while providing small amounts of data to begin with. In this study we apply one of the newest innovations in Deep Learning to a task of text classification. We created a question answering system in Italian that provides information about a specific subject, e-invoicing and digital billing. Italy recently introduced a new legislation about e-invoicing and people have some legit doubts, therefore a large share of professionals could benefit from this tool.

2016

pdf abs
Automatic Enrichment of WordNet with Common-Sense Knowledge
Luigi Di Caro | Guido Boella
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

WordNet represents a cornerstone in the Computational Linguistics field, linking words to meanings (or senses) through a taxonomical representation of synsets, i.e., clusters of words with an equivalent meaning in a specific context often described by few definitions (or glosses) and examples. Most of the approaches to the Word Sense Disambiguation task fully rely on these short texts as a source of contextual information to match with the input text to disambiguate. This paper presents the first attempt to enrich synsets data with common-sense definitions, automatically retrieved from ConceptNet 5, and disambiguated accordingly to WordNet. The aim was to exploit the shared- and immediate-thinking nature of common-sense knowledge to extend the short but incredibly useful contextual information of the synsets. A manual evaluation on a subset of the entire result (which counts a total of almost 600K synset enrichments) shows a very high precision with an estimated good recall.

pdf
NORMAS at SemEval-2016 Task 1: SEMSIM: A Multi-Feature Approach to Semantic Text Similarity
Kolawole Adebayo | Luigi Di Caro | Guido Boella
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2014

pdf abs
Exploiting networks in Law
Livio Robaldo | Guido Boella | Luigi Di Caro | Andrea Violato
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we first introduce the working context related to the understanding of an heterogeneous network of references contained in the Italian regulatory framework. We then present an extended analysis of a large network of laws, providing several types of analytical evaluation that can be used within a legal management system for understanding the data through summarization, visualization, and browsing. In the legal domain, yet several tasks are strictly supervised by humans, with strong consumption of time and energy that would dramatically drop with the help of automatic or semi-automatic supporting tools. We overview different techniques and methodologies explaining how they can be helpful in actual scenarios.

2013

pdf
Extracting Definitions and Hypernym Relations relying on Syntactic Dependencies and Support Vector Machines
Guido Boella | Luigi Di Caro
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf abs
NLP Challenges for Eunomos a Tool to Build and Manage Legal Knowledge
Guido Boella | Luigi di Caro | Llio Humphreys | Livio Robaldo | Leon van der Torre
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper, we describe how NLP can semi-automate the construction and analysis of knowledge in Eunomos, a legal knowledge management service which enables users to view legislation from various sources and find the right definitions and explanations of legal concepts in a given context. NLP can semi-automate some routine tasks currently performed by knowledge engineers, such as classifying norm, or linking key terms within legislation to ontological concepts. This helps overcome the resource bottleneck problem of creating specialist knowledge management systems. While accuracy is of the utmost importance in the legal domain, and the information should be verified by domain experts as a matter of course, a semi-automated approach can result in considerable efficiency gains.