2022
pdf
abs
Towards the Creation of a Diachronic Corpus for Italian: A Case Study on the GDLI Quotations
Manuel Favaro
|
Elisa Guadagnini
|
Eva Sassolini
|
Marco Biffi
|
Simonetta Montemagni
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
In this paper we describe some experiments related to a corpus derived from an authoritative historical Italian dictionary, namely the Grande dizionario della lingua italiana (‘Great Dictionary of Italian Language’, in short GDLI). Thanks to the digitization and structuring of this dictionary, we have been able to set up the first nucleus of a diachronic annotated corpus that selects—according to specific criteria, and distinguishing between prose and poetry—some of the quotations that within the entries illustrate the different definitions and sub-definitions. In fact, the GDLI presents a huge collection of quotations covering the entire history of the Italian language and thus ranging from the Middle Ages to the present day. The corpus was enriched with linguistic annotation and used to train and evaluate NLP models for POS tagging and lemmatization, with promising results.
2016
pdf
abs
ALT Explored: Integrating an Online Dialectometric Tool and an Online Dialect Atlas
Martijn Wieling
|
Eva Sassolini
|
Sebastiana Cucurullo
|
Simonetta Montemagni
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
In this paper, we illustrate the integration of an online dialectometric tool, Gabmap, together with an online dialect atlas, the Atlante Lessicale Toscano (ALT-Web). By using a newly created url-based interface to Gabmap, ALT-Web is able to take advantage of the sophisticated dialect visualization and exploration options incorporated in Gabmap. For example, distribution maps showing the distribution in the Tuscan dialect area of a specific dialectal form (selected via the ALT-Web website) are easily obtainable. Furthermore, the complete ALT-Web dataset as well as subsets of the data (selected via the ALT-Web website) can be automatically uploaded and explored in Gabmap. By combining these two online applications, macro- and micro-analyses of dialectal data (respectively offered by Gabmap and ALT-Web) are effectively and dynamically combined.
2010
pdf
abs
Cultural Heritage: Knowledge Extraction from Web Documents
Eva Sassolini
|
Alessandra Cinini
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This article presents the use of NLP techniques (text mining, text analysis) to develop specific tools that allow to create linguistic resources related to the cultural heritage domain. The aim of our approach is to create tools for the building of an online knowledge network, automatically extracted from text materials concerning this domain. A particular methodology was experimented by dividing the automatic acquisition of texts, and consequently, the creation of reference corpus in two phases. In the first phase, on-line documents have been extracted from lists of links provided by human experts. All documents extracted from the web by means of automatic spider have been stored in a repository of text materials. On the basis of these documents, automatic parsers create the reference corpus for the cultural heritage domain. Relevant information and semantic concepts are then extracted from this corpus. In a second phase, all these semantically relevant elements (such as proper names, names of institutions, names of places, and other relevant terms) have been used as basis for a new search strategy of text materials from heterogeneous sources. In this case also specialized crawlers (TP-crawler) have been used to work on a bulk of text materials available on line.
2008
pdf
abs
Semantic Press
Eugenio Picchi
|
Eva Sassolini
|
Sebastiana Cucurullo
|
Francesca Bertagna
|
Paola Baroni
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In this paper Semantic Press, a tool for the automatic press review, is introduced. It is based on Text Mining technologies and is tailored to meet the needs of the eGovernment and eParticipation communities. First, a general description of the application demands emerging from the eParticipation and eGovernment sectors is offered. Then, an introduction to the framework of the automatic analysis and classification of newspaper content is provided, together with a description of the technologies underlying it.
2006
pdf
abs
Next Generation Language Resources using Grid
Federico Calzolari
|
Eva Sassolini
|
Manuela Sassi
|
Sebastiana Cucurullo
|
Eugenio Picchi
|
Francesca Bertagna
|
Alessandro Enea
|
Monica Monachini
|
Claudia Soria
|
Nicoletta Calzolari
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper presents a case study concerning the challenges and requirements posed by next generation language resources, realized as an overall model of open, distributed and collaborative language infrastructure. If a sort of new paradigm for language resource sharing is required, we think that the emerging and still evolving technology connected to Grid computing is a very interesting and suitable one for a concrete realization of this vision. Given the current limitations of Grid computing, it is very important to test the new environment on basic language analysis tools, in order to get the feeling of what are the potentialities and possible limitations connected to its use in NLP. For this reason, we have done some experiments on a module of the Linguistic Miner, i.e. the extraction of linguistic patterns from restricted domain corpora. The Grid environment has produced the expected results (reduction of the processing time, huge storage capacity, data redundancy) without any additional cost for the final user.
pdf
abs
Dialectal resources on-line: the ALT-Web experience
Nella Cucurullo
|
Simonetta Montemagni
|
Matilde Paoli
|
Eugenio Picchi
|
Eva Sassolini
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The paper presents an on-line dialectal resource, ALT-Web, which gives access to the linguistic data of the Atlante Lessicale Toscano, a specially designed linguistic atlas in which lexical data have both a diatopic and diastratic characterisation. The paper focuses on: the dialectal data representation model; the access modalities to the ALT dialectal corpus; ontology-based search.
2004
pdf
Linguistic Miner: An Italian Linguistic Knowledge System
Eugenio Picchi
|
Maria Luigia Ceccotti
|
Sebastiana Cucurullo
|
Manuela Sassi
|
Eva Sassolini
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
2002
pdf
Italian arabic linguistic tools
Eugenio Picchi
|
Eva Sassolini
|
Ouafae Nahli
|
Sebastiana Cucurullo
|
M. Isabel Vargas
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)