Fabio Tamburini

Also published as: F. Tamburini


2019

pdf bib
A Quantum-Like Approach to Word Sense Disambiguation
Fabio Tamburini
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

This paper presents a novel algorithm for Word Sense Disambiguation (WSD) based on Quantum Probability Theory. The Quantum WSD algorithm requires concepts representations as vectors in the complex domain and thus we have developed a technique for computing complex word and sentence embeddings based on the Paragraph Vectors algorithm. Despite the proposed method is quite simple and that it does not require long training phases, when it is evaluated on a standardized benchmark for this task it exhibits state-of-the-art (SOTA) performances.

2018

pdf bib
PoSTWITA-UD: an Italian Twitter Treebank in Universal Dependencies
Manuela Sanguinetti | Cristina Bosco | Alberto Lavelli | Alessandro Mazzei | Oronzo Antonelli | Fabio Tamburini
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Towards Quantum Language Models
Ivano Basile | Fabio Tamburini
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper presents a new approach for building Language Models using the Quantum Probability Theory, a Quantum Language Model (QLM). It mainly shows that relying on this probability calculus it is possible to build stochastic models able to benefit from quantum correlations due to interference and entanglement. We extensively tested our approach showing its superior performances, both in terms of model perplexity and inserting it into an automatic speech recognition evaluation setting, when compared with state-of-the-art language modelling techniques.

pdf bib
Annotating Italian Social Media Texts in Universal Dependencies
Manuela Sanguinetti | Cristina Bosco | Alessandro Mazzei | Alberto Lavelli | Fabio Tamburini
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

pdf bib
Semgrex-Plus: a Tool for Automatic Dependency-Graph Rewriting
Fabio Tamburini
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

2016

pdf bib
Specialising Paragraph Vectors for Text Polarity Detection
Fabio Tamburini
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents some experiments for specialising Paragraph Vectors, a new technique for creating text fragment (phrase, sentence, paragraph, text, ...) embedding vectors, for text polarity detection. The first extension regards the injection of polarity information extracted from a polarity lexicon into embeddings and the second extension aimed at inserting word order information into Paragraph Vectors. These two extensions, when training a logistic-regression classifier on the combined embeddings, were able to produce a relevant gain in performance when compared to the standard Paragraph Vector methods proposed by Le and Mikolov (2014).

pdf bib
Automatic identification of Mild Cognitive Impairment through the analysis of Italian spontaneous speech productions
Daniela Beltrami | Laura Calzà | Gloria Gagliardi | Enrico Ghidoni | Norina Marcello | Rema Rossini Favretti | Fabio Tamburini
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents some preliminary results of the OPLON project. It aimed at identifying early linguistic symptoms of cognitive decline in the elderly. This pilot study was conducted on a corpus composed of spontaneous speech sample collected from 39 subjects, who underwent a neuropsychological screening for visuo-spatial abilities, memory, language, executive functions and attention. A rich set of linguistic features was extracted from the digitalised utterances (at phonetic, suprasegmental, lexical, morphological and syntactic levels) and the statistical significance in pinpointing the pathological process was measured. Our results show remarkable trends for what concerns both the linguistic traits selection and the automatic classifiers building.

2012

pdf bib
AnIta: a powerful morphological analyser for Italian
Fabio Tamburini | Matias Melandri
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper we present AnIta, a powerful morphological analyser for Italian implemented within the framework of finite-state-automata models. It is provided by a large lexicon containing more than 110,000 lemmas that enable it to cover relevant portions of Italian texts. We describe our design choices for the management of inflectional phenomena as well as some interesting new features to explicitly handle derivational and compositional processes in Italian, namely the wordform segmentation structure and Derivation Graph. Two different evaluation experiments, for testing coverage (Recall) and Precision, are described in detail, comparing the AnIta performances with some other freely available tools to handle Italian morphology. The experiments results show that the AnIta Morphological Analyser obtains the best performances among the tested systems, with Recall = 97.21% and Precision = 98.71%. This tool was a fundamental building block for designing a performant PoS-tagger and Lemmatiser for the Italian language that participated to two EVALITA evaluation campaigns ranking, in both cases, together with the best performing systems.

pdf bib
A topologic view of Topic and Focus marking in Italian
Gloria Gagliardi | Edoardo Lombardi Vallauri | Fabio Tamburini
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Regularities in position and level of prosodic prominences associated to patterns of Information Structure are identified for some Italian varieties. The experiments' results suggest a possibly new structural hypothesis on the role and function of the main prominence in marking information patterns. (1) An abstract and merely structural, """"topologic"""" concept of Prominence location can be conceived of, as endowed with the function of demarcation between units, before their culmination and """"description"""". This may suffice to explain much of the process by which speakers interpret the IS of utterances in discourse. Further features, such as the specific intonational contours of the different IS units, may thus represent a certain amount of redundancy. (2) Real utterances do not always signal the distribution of Topic and Focus clearly. Acoustically, many remain underspecified in this respect. This is especially true for the distinction between Topic-Focus and Broad Focus, which indeed often has no serious effects on the progression of communicative dynamism in the subsequent discourse. (3) The consistency of such results with the law of least effort, and the very high percent of matching between perceptual evaluations and automatic measurement, seem to validate the used algorithm.

2008

pdf bib
Evaluation of Natural Language Tools for Italian: EVALITA 2007
Bernardo Magnini | Amedeo Cappelli | Fabio Tamburini | Cristina Bosco | Alessandro Mazzei | Vincenzo Lombardo | Francesca Bertagna | Nicoletta Calzolari | Antonio Toral | Valentina Bartalesi Lenzi | Rachele Sprugnoli | Manuela Speranza
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

EVALITA 2007, the first edition of the initiative devoted to the evaluation of Natural Language Processing tools for Italian, provided a shared framework where participants’ systems had the possibility to be evaluated on five different tasks, namely Part of Speech Tagging (organised by the University of Bologna), Parsing (organised by the University of Torino), Word Sense Disambiguation (organised by CNR-ILC, Pisa), Temporal Expression Recognition and Normalization (organised by CELCT, Trento), and Named Entity Recognition (organised by FBK, Trento). We believe that the diffusion of shared tasks and shared evaluation practices is a crucial step towards the development of resources and tools for Natural Language Processing. Experiences of this kind, in fact, are a valuable contribution to the validation of existing models and data, allowing for consistent comparisons among approaches and among representation schemes. The good response obtained by EVALITA, both in the number of participants and in the quality of results, showed that pursuing such goals is feasible not only for English, but also for other languages.

2006

pdf bib
POS tagset design for Italian
Raffaella Bernardi | Andrea Bolognesi | Corrado Seidenari | Fabio Tamburini
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We aim to automatically induce a PoS tagset for Italian by analysing the distributional behaviour of Italian words. To this end, we propose an algorithm that (a) extracts information from loosely labelled dependency structures that encode only basic and broadly accepted syntactic relations, namely Head/Dependent and the distinction of dependents into Argument vs. Adjunct, and (b) derives a possible set of word classes. The paper reports on some preliminary experiments carried out using the induced tagset in conjunction with state-of-the-art PoS taggers. The method proposed to design a proper tagset exploits little, if any, language-specific knowledge: hence it is in principle applicable to any language.

pdf bib
The DiaCORIS project: a diachronic corpus of written Italian
C. Onelli | D. Proietti | C. Seidenari | F. Tamburini
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The DiaCORIS project aims at the construction of a diachronic corpus comprising written Italian texts produced between 1861 and 1945, extending the structure and the research possibilities of the synchronic 100-million word corpus CORIS/CODIS. A preliminary in depth study has been performed in order to design a representative and well balanced sample of the Italian language over a time period that contains all the main events of contemporary Italian history from the National Unification to the end of the Second World War. The paper describes in detail such design processes as the definition of the main subcorpora and their proportions, the type of documents inserted in each part of the corpus, the document annotation schema and the technological infrastructure designed to manage the corpus access as well as the web interface to corpus data.

2005

pdf bib
Automatic Induction of a POS Tagset for Italian
Raffaella Bernardi | Andrea Bolognesi | Corrado Seidenari | Fabio Tamburini
Proceedings of the Australasian Language Technology Workshop 2005

2004

pdf bib
Categorial Type Logic meets Dependency Grammar to annotate an Italian corpus
R. Bernardi | A. Bolognesi | F. Tamburini | M. Moortgat
Proceedings of the Workshop on Recent Advances in Dependency Grammar

pdf bib
Building Distributed Language Resources By Grid Computing
Fabio Tamburini
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib
Automatic detection of prosodic prominence in continuous speech
Fabio Tamburini
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
A dynamic model for reference corpora structure definition
Fabio Tamburini
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)