Manuel Favaro
2026
From Print to Digital and beyond: The Retrodigitization of a Historical Dictionary of Italian as a Hybrid Lexical Resource
Marco Biffi | Sebastiana Cucurullo | Manuel Favaro | Elisa Guadagnini | Simonetta Montemagni | Eva Sassolini
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Marco Biffi | Sebastiana Cucurullo | Manuel Favaro | Elisa Guadagnini | Simonetta Montemagni | Eva Sassolini
Proceedings of the Fifteenth Language Resources and Evaluation Conference
This paper presents the retrodigitization project of the Grande Dizionario della Lingua Italiana (GDLI), the largest and most comprehensive historical dictionary of the Italian language. The GDLI’s 23,000 pages — originally designed for human consultation — constitute an exceptional repository of linguistic and cultural-historical information, while posing significant challenges to large-scale digitization and data structuring. The project, still ongoing, will result in the development of a set of interoperable and interlinked resources: (i) a TEI-XML edition of the dictionary text, encoding its complex lexicographic and citation structure; (ii) an annotated corpus of the quoted examples, enabling linguistic and historical research across centuries; and (iii) a database of cited authors and works. Together, these components form a hybrid lexical resource that establishes the foundations for innovative and advanced modes of accessing and exploring the rich and multifaceted content of this historical dictionary.
2022
Towards the Creation of a Diachronic Corpus for Italian: A Case Study on the GDLI Quotations
Manuel Favaro | Elisa Guadagnini | Eva Sassolini | Marco Biffi | Simonetta Montemagni
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
Manuel Favaro | Elisa Guadagnini | Eva Sassolini | Marco Biffi | Simonetta Montemagni
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
In this paper we describe some experiments related to a corpus derived from an authoritative historical Italian dictionary, namely the Grande dizionario della lingua italiana (‘Great Dictionary of Italian Language’, in short GDLI). Thanks to the digitization and structuring of this dictionary, we have been able to set up the first nucleus of a diachronic annotated corpus that selects—according to specific criteria, and distinguishing between prose and poetry—some of the quotations that within the entries illustrate the different definitions and sub-definitions. In fact, the GDLI presents a huge collection of quotations covering the entire history of the Italian language and thus ranging from the Middle Ages to the present day. The corpus was enriched with linguistic annotation and used to train and evaluate NLP models for POS tagging and lemmatization, with promising results.