Stefano Faralli


2022

pdf
Universal Semantic Annotator: the First Unified API for WSD, SRL and Semantic Parsing
Riccardo Orlando | Simone Conia | Stefano Faralli | Roberto Navigli
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we present the Universal Semantic Annotator (USeA), which offers the first unified API for high-quality automatic annotations of texts in 100 languages through state-of-the-art systems for Word Sense Disambiguation, Semantic Role Labeling and Semantic Parsing. Together, such annotations can be used to provide users with rich and diverse semantic information, help second-language learners, and allow researchers to integrate explicit semantic knowledge into downstream tasks and real-world applications.

pdf
A Large Interlinked Knowledge Graph of the Italian Cultural Heritage
Stefano Faralli | Andrea Lenzi | Paola Velardi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Knowledge is the lifeblood for a plethora of applications such as search, recommender systems and natural language understanding. Thanks to the efforts in the fields of Semantic Web and Linked Open Data a growing number of interlinked knowledge bases are supporting the development of advanced knowledge-based applications. Unfortunately, for a large number of domain-specific applications, these knowledge bases are unavailable. In this paper, we present a resource consisting of a large knowledge graph linking the Italian cultural heritage entities (defined in the ArCo ontology) with the concepts defined on well-known knowledge bases (i.e., DBpedia and the Getty GVP ontology). We describe the methodologies adopted for the semi-automatic resource creation and provide an in-depth analysis of the resulting interlinked graph.

2020

pdf
Multiple Knowledge GraphDB (MKGDB)
Stefano Faralli | Paola Velardi | Farid Yusifli
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present MKGDB, a large-scale graph database created as a combination of multiple taxonomy backbones extracted from 5 existing knowledge graphs, namely: ConceptNet, DBpedia, WebIsAGraph, WordNet and the Wikipedia category hierarchy. MKGDB, thanks the versatility of the Neo4j graph database manager technology, is intended to favour and help the development of open-domain natural language processing applications relying on knowledge bases, such as information extraction, hypernymy discovery, topic clustering, and others. Our resource consists of a large hypernymy graph which counts more than 37 million nodes and more than 81 million hypernymy relations.

pdf
Evaluation Dataset and Methodology for Extracting Application-Specific Taxonomies from the Wikipedia Knowledge Graph
Georgeta Bordea | Stefano Faralli | Fleur Mougin | Paul Buitelaar | Gayo Diallo
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this work, we address the task of extracting application-specific taxonomies from the category hierarchy of Wikipedia. Previous work on pruning the Wikipedia knowledge graph relied on silver standard taxonomies which can only be automatically extracted for a small subset of domains rooted in relatively focused nodes, placed at an intermediate level in the knowledge graphs. In this work, we propose an iterative methodology to extract an application-specific gold standard dataset from a knowledge graph and an evaluation framework to comparatively assess the quality of noisy automatically extracted taxonomies. We employ an existing state of the art algorithm in an iterative manner and we propose several sampling strategies to reduce the amount of manual work needed for evaluation. A first gold standard dataset is released to the research community for this task along with a companion evaluation framework. This dataset addresses a real-world application from the medical domain, namely the extraction of food-drug and herb-drug interactions.

2018

pdf
Enriching Frame Representations with Distributionally Induced Senses
Stefano Faralli | Alexander Panchenko | Chris Biemann | Simone Paolo Ponzetto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Improving Hypernymy Extraction with Distributional Semantic Classes
Alexander Panchenko | Dmitry Ustalov | Stefano Faralli | Simone P. Ponzetto | Chris Biemann
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl
Alexander Panchenko | Eugen Ruppert | Stefano Faralli | Simone P. Ponzetto | Chris Biemann
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
MIsA: Multilingual “IsA” Extraction from Corpora
Stefano Faralli | Els Lefever | Simone Paolo Ponzetto
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
A Large Multilingual and Multi-domain Dataset for Recommender Systems
Giorgia Di Tommaso | Stefano Faralli | Paola Velardi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf
Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation
Alexander Panchenko | Fide Marten | Eugen Ruppert | Stefano Faralli | Dmitry Ustalov | Simone Paolo Ponzetto | Chris Biemann
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Interpretability of a predictive model is a powerful feature that gains the trust of users in the correctness of the predictions. In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images. We present a WSD system that bridges the gap between these two so far disconnected groups of methods. Namely, our system, providing access to several state-of-the-art WSD models, aims to be interpretable as a knowledge-based system while it remains completely unsupervised and knowledge-free. The presented tool features a Web interface for all-word disambiguation of texts that makes the sense predictions human readable by providing interpretable word sense inventories, sense representations, and disambiguation results. We provide a public API, enabling seamless integration.

pdf
Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation
Alexander Panchenko | Eugen Ruppert | Stefano Faralli | Simone Paolo Ponzetto | Chris Biemann
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

The current trend in NLP is the use of highly opaque models, e.g. neural networks and word embeddings. While these models yield state-of-the-art results on a range of tasks, their drawback is poor interpretability. On the example of word sense induction and disambiguation (WSID), we show that it is possible to develop an interpretable model that matches the state-of-the-art models in accuracy. Namely, we present an unsupervised, knowledge-free WSID approach, which is interpretable at three levels: word sense inventory, sense feature representations, and disambiguation procedure. Experiments show that our model performs on par with state-of-the-art word sense embeddings and other unsupervised systems while offering the possibility to justify its decisions in human-readable form.

pdf
The ContrastMedium Algorithm: Taxonomy Induction From Noisy Knowledge Graphs With Just A Few Links
Stefano Faralli | Alexander Panchenko | Chris Biemann | Simone Paolo Ponzetto
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

In this paper, we present ContrastMedium, an algorithm that transforms noisy semantic networks into full-fledged, clean taxonomies. ContrastMedium is able to identify the embedded taxonomy structure from a noisy knowledge graph without explicit human supervision such as, for instance, a set of manually selected input root and leaf concepts. This is achieved by leveraging structural information from a companion reference taxonomy, to which the input knowledge graph is linked (either automatically or manually). When used in conjunction with methods for hypernym acquisition and knowledge base linking, our methodology provides a complete solution for end-to-end taxonomy induction. We conduct experiments using automatically acquired knowledge graphs, as well as a SemEval benchmark, and show that our method is able to achieve high performance on the task of taxonomy induction.

pdf
Using Linked Disambiguated Distributional Networks for Word Sense Disambiguation
Alexander Panchenko | Stefano Faralli | Simone Paolo Ponzetto | Chris Biemann
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications

We introduce a new method for unsupervised knowledge-based word sense disambiguation (WSD) based on a resource that links two types of sense-aware lexical networks: one is induced from a corpus using distributional semantics, the other is manually constructed. The combination of two networks reduces the sparsity of sense representations used for WSD. We evaluate these enriched representations within two lexical sample sense disambiguation benchmarks. Our results indicate that (1) features extracted from the corpus-based resource help to significantly outperform a model based solely on the lexical resource; (2) our method achieves results comparable or better to four state-of-the-art unsupervised knowledge-based WSD systems including three hybrid systems that also rely on text corpora. In contrast to these hybrid methods, our approach does not require access to web search engines, texts mapped to a sense inventory, or machine translation systems.

2016

pdf
A Large DataBase of Hypernymy Relations Extracted from the Web.
Julian Seitner | Christian Bizer | Kai Eckert | Stefano Faralli | Robert Meusel | Heiko Paulheim | Simone Paolo Ponzetto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Hypernymy relations (those where an hyponym term shares a “isa” relationship with his hypernym) play a key role for many Natural Language Processing (NLP) tasks, e.g. ontology learning, automatically building or extending knowledge bases, or word sense disambiguation and induction. In fact, such relations may provide the basis for the construction of more complex structures such as taxonomies, or be used as effective background knowledge for many word understanding applications. We present a publicly available database containing more than 400 million hypernymy relations we extracted from the CommonCrawl web corpus. We describe the infrastructure we developed to iterate over the web corpus for extracting the hypernymy relations and store them effectively into a large database. This collection of relations represents a rich source of knowledge and may be useful for many researchers. We offer the tuple dataset for public download and an Application Programming Interface (API) to help other researchers programmatically query the database.

pdf
TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling
Alexander Panchenko | Stefano Faralli | Eugen Ruppert | Steffen Remus | Hubert Naets | Cédrick Fairon | Simone Paolo Ponzetto | Chris Biemann
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf
SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval)
Georgeta Bordea | Paul Buitelaar | Stefano Faralli | Roberto Navigli
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2013

pdf
Growing Multi-Domain Glossaries from a Few Seeds using Probabilistic Topic Models
Stefano Faralli | Roberto Navigli
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf
OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction
Paola Velardi | Stefano Faralli | Roberto Navigli
Computational Linguistics, Volume 39, Issue 3 - September 2013

pdf
GlossBoot: Bootstrapping Multilingual Domain Glossaries from the Web
Flavio De Benedictis | Stefano Faralli | Roberto Navigli
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
A Java Framework for Multilingual Definition and Hypernym Extraction
Stefano Faralli | Roberto Navigli
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2012

pdf
A New Minimally-Supervised Framework for Domain Word Sense Disambiguation
Stefano Faralli | Roberto Navigli
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf
A New Method for Evaluating Automatically Learned Terminological Taxonomies
Paola Velardi | Roberto Navigli | Stefano Faralli | Juana Maria Ruiz Martinez
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Evaluating a taxonomy learned automatically against an existing gold standard is a very complex problem, because differences stem from the number, label, depth and ordering of the taxonomy nodes. In this paper we propose casting the problem as one of comparing two hierarchical clusters. To this end we defined a variation of the Fowlkes and Mallows measure (Fowlkes and Mallows, 1983). Our method assigns a similarity value B^i_(l,r) to the learned (l) and reference (r) taxonomy for each cut i of the corresponding anonymised hierarchies, starting from the topmost nodes down to the leaf concepts. For each cut i, the two hierarchies can be seen as two clusterings C^i_l , C^i_r of the leaf concepts. We assign a prize to early similarity values, i.e. when concepts are clustered in a similar way down to the lowest taxonomy levels (close to the leaf nodes). We apply our method to the evaluation of the taxonomy learning methods put forward by Navigli et al. (2011) and Kozareva and Hovy (2010).