Rachele Sprugnoli

Also published as: R. Sprugnoli


2022

pdf bib
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
Rachele Sprugnoli | Marco Passarotti
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

pdf
Overview of the EvaLatin 2022 Evaluation Campaign
Rachele Sprugnoli | Marco Passarotti | Flavio Massimiliano Cecchini | Margherita Fantoli | Giovanni Moretti
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

This paper describes the organization and the results of the second edition of EvaLatin, the campaign for the evaluation of Natural Language Processing tools for Latin. The three shared tasks proposed in EvaLatin 2022, i.,e.,Lemmatization, Part-of-Speech Tagging and Features Identification, are aimed to foster research in the field of language technologies for Classical languages. The shared dataset consists of texts mainly taken from the LASLA corpus. More specifically, the training set includes only prose texts of the Classical period, whereas the test set is organized in three sub-tasks: a Classical sub-task on a prose text of an author not included in the training data, a Cross-genre sub-task on poetic and scientific texts, and a Cross-time sub-task on a text of the 15th century. The results obtained by the participants for each task and sub-task are presented and discussed.

pdf bib
Proceedings of the 2nd Workshop on Sentiment Analysis and Linguistic Linked Data
Ilan Kernerman | Sara Carvalho | Carlos A. Iglesias | Rachele Sprugnoli
Proceedings of the 2nd Workshop on Sentiment Analysis and Linguistic Linked Data

2020

pdf bib
Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages
Rachele Sprugnoli | Marco Passarotti
Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages

pdf
Overview of the EvaLatin 2020 Evaluation Campaign
Rachele Sprugnoli | Marco Passarotti | Flavio Massimiliano Cecchini | Matteo Pellegrini
Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages

This paper describes the first edition of EvaLatin, a campaign totally devoted to the evaluation of NLP tools for Latin. The two shared tasks proposed in EvaLatin 2020, i. e. Lemmatization and Part-of-Speech tagging, are aimed at fostering research in the field of language technologies for Classical languages. The shared dataset consists of texts taken from the Perseus Digital Library, processed with UDPipe models and then manually corrected by Latin experts. The training set includes only prose texts by Classical authors. The test set, alongside with prose texts by the same authors represented in the training set, also includes data relative to poetry and to the Medieval period. This also allows us to propose the Cross-genre and Cross-time subtasks for each task, in order to evaluate the portability of NLP tools for Latin across different genres and time periods. The results obtained by the participants for each task and subtask are presented and discussed.

pdf
Odi et Amo. Creating, Evaluating and Extending Sentiment Lexicons for Latin.
Rachele Sprugnoli | Marco Passarotti | Daniela Corbetta | Andrea Peverelli
Proceedings of the Twelfth Language Resources and Evaluation Conference

Sentiment lexicons are essential for developing automatic sentiment analysis systems, but the resources currently available mostly cover modern languages. Lexicons for ancient languages are few and not evaluated with high-quality gold standards. However, the study of attitudes and emotions in ancient texts is a growing field of research which poses specific issues (e.g., lack of native speakers, limited amount of data, unusual textual genres for the sentiment analysis task, such as philosophical or documentary texts) and can have an impact on the work of scholars coming from several disciplines besides computational linguistics, e.g. historians and philologists. The work presented in this paper aims at providing the research community with a set of sentiment lexicons built by taking advantage of manually-curated resources belonging to the long tradition of Latin corpora and lexicons creation. Our interdisciplinary approach led us to release: i) two automatically generated sentiment lexicons; ii) a gold standard developed by two Latin language and culture experts; iii) a silver standard in which semantic and derivational relations are exploited so to extend the list of lexical items of the gold standard. In addition, the evaluation procedure is described together with a first application of the lexicons to a Latin tragedy.

2019

pdf bib
Novel Event Detection and Classification for Historical Texts
Rachele Sprugnoli | Sara Tonelli
Computational Linguistics, Volume 45, Issue 2 - June 2019

Event processing is an active area of research in the Natural Language Processing community, but resources and automatic systems developed so far have mainly addressed contemporary texts. However, the recognition and elaboration of events is a crucial step when dealing with historical texts Particularly in the current era of massive digitization of historical sources: Research in this domain can lead to the development of methodologies and tools that can assist historians in enhancing their work, while having an impact also on the field of Natural Language Processing. Our work aims at shedding light on the complex concept of events when dealing with historical texts. More specifically, we introduce new annotation guidelines for event mentions and types, categorized into 22 classes. Then, we annotate a historical corpus accordingly, and compare two approaches for automatic event detection and classification following this novel scheme. We believe that this work can foster research in a field of inquiry as yet underestimated in the area of Temporal Information Processing. To this end, we release new annotation guidelines, a corpus, and new models for automatic annotation.

2018

pdf
Creating a WhatsApp Dataset to Study Pre-teen Cyberbullying
Rachele Sprugnoli | Stefano Menini | Sara Tonelli | Filippo Oncini | Enrico Piras
Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)

Although WhatsApp is used by teenagers as one major channel of cyberbullying, such interactions remain invisible due to the app privacy policies that do not allow ex-post data collection. Indeed, most of the information on these phenomena rely on surveys regarding self-reported data. In order to overcome this limitation, we describe in this paper the activities that led to the creation of a WhatsApp dataset to study cyberbullying among Italian students aged 12-13. We present not only the collected chats with annotations about user role and type of offense, but also the living lab created in a collaboration between researchers and schools to monitor and analyse cyberbullying. Finally, we discuss some open issues, dealing with ethical, operational and epistemic aspects.

2017

pdf
The Content Types Dataset: a New Resource to Explore Semantic and Functional Characteristics of Texts
Rachele Sprugnoli | Tommaso Caselli | Sara Tonelli | Giovanni Moretti
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper presents a new resource, called Content Types Dataset, to promote the analysis of texts as a composition of units with specific semantic and functional roles. By developing this dataset, we also introduce a new NLP task for the automatic classification of Content Types. The annotation scheme and the dataset are described together with two sets of classification experiments.

pdf
RAMBLE ON: Tracing Movements of Popular Historical Figures
Stefano Menini | Rachele Sprugnoli | Giovanni Moretti | Enrico Bignotti | Sara Tonelli | Bruno Lepri
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

We present RAMBLE ON, an application integrating a pipeline for frame-based information extraction and an interface to track and display movement trajectories. The code of the extraction pipeline and a navigator are freely available; moreover we display in a demonstrator the outcome of a case study carried out on trajectories of notable persons of the XX Century.

2016

pdf
NLP and Public Engagement: The Case of the Italian School Reform
Tommaso Caselli | Giovanni Moretti | Rachele Sprugnoli | Sara Tonelli | Damien Lanfrey | Donatella Solda Kutzmann
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we present PIERINO (PIattaforma per l’Estrazione e il Recupero di INformazione Online), a system that was implemented in collaboration with the Italian Ministry of Education, University and Research to analyse the citizens’ comments given in #labuonascuola survey. The platform includes various levels of automatic analysis such as key-concept extraction and word co-occurrences. Each analysis is displayed through an intuitive view using different types of visualizations, for example radar charts and sunburst. PIERINO was effectively used to support shaping the last Italian school reform, proving the potential of NLP in the context of policy making.

pdf
“Who was Pietro Badoglio?” Towards a QA system for Italian History
Stefano Menini | Rachele Sprugnoli | Antonio Uva
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents QUANDHO (QUestion ANswering Data for italian HistOry), an Italian question answering dataset created to cover a specific domain, i.e. the history of Italy in the first half of the XX century. The dataset includes questions manually classified and annotated with Lexical Answer Types, and a set of question-answer pairs. This resource, freely available for research purposes, has been used to retrain a domain independent question answering system so to improve its performances in the domain of interest. Ongoing experiments on the development of a question classifier and an automatic tagger of Lexical Answer Types are also presented.

pdf
Temporal Information Annotation: Crowd vs. Experts
Tommaso Caselli | Rachele Sprugnoli | Oana Inel
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes two sets of crowdsourcing experiments on temporal information annotation conducted on two languages, i.e., English and Italian. The first experiment, launched on the CrowdFlower platform, was aimed at classifying temporal relations given target entities. The second one, relying on the CrowdTruth metric, consisted in two subtasks: one devoted to the recognition of events and temporal expressions and one to the detection and classification of temporal relations. The outcomes of the experiments suggest a valuable use of crowdsourcing annotations also for a complex task like Temporal Processing.

2014

pdf bib
Annotating Causality in the TempEval-3 Corpus
Paramita Mirza | Rachele Sprugnoli | Sara Tonelli | Manuela Speranza
Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL)

pdf
Crowdsourcing for the identification of event nominals: an experiment
Rachele Sprugnoli | Alessandro Lenci
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents the design and results of a crowdsourcing experiment on the recognition of Italian event nominals. The aim of the experiment was to assess the feasibility of crowdsourcing methods for a complex semantic task such as distinguishing the eventive interpretation of polysemous nominals taking into consideration various types of syntagmatic cues. Details on the theoretical background and on the experiment set up are provided together with the final results in terms of accuracy and inter-annotator agreement. These results are compared with the ones obtained by expert annotators on the same task. The low values in accuracy and Fleiss’ kappa of the crowdsourcing experiment demonstrate that crowdsourcing is not always optimal for complex linguistic tasks. On the other hand, the use of non-expert contributors allows to understand what are the most ambiguous patterns of polysemy and the most useful syntagmatic cues to be used to identify the eventive reading of nominals.

pdf
CROMER: a Tool for Cross-Document Event and Entity Coreference
Christian Girardi | Manuela Speranza | Rachele Sprugnoli | Sara Tonelli
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we present CROMER (CROss-document Main Events and entities Recognition), a novel tool to manually annotate event and entity coreference across clusters of documents. The tool has been developed so as to handle large collections of documents, perform collaborative annotation (several annotators can work on the same clusters), and enable the linking of the annotated data to external knowledge sources. Given the availability of semantic information encoded in Semantic Web resources, this tool is designed to support annotators in linking entities and events to DBPedia and Wikipedia, so as to facilitate the automatic retrieval of additional semantic information. In this way, event modelling and chaining is made easy, while guaranteeing the highest interconnection with external resources. For example, the tool can be easily linked to event models such as the Simple Event Model [Van Hage et al , 2011] and the Grounded Annotation Framework [Fokkens et al. 2013].

2013

pdf bib
GAF: A Grounded Annotation Framework for Events
Antske Fokkens | Marieke van Erp | Piek Vossen | Sara Tonelli | Willem Robert van Hage | Luciano Serafini | Rachele Sprugnoli | Jesper Hoeksema
Workshop on Events: Definition, Detection, Coreference, and Representation

2012

pdf
CAT: the CELCT Annotation Tool
Valentina Bartalesi Lenzi | Giovanni Moretti | Rachele Sprugnoli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents CAT - CELCT Annotation Tool, a new general-purpose web-based tool for text annotation developed by CELCT (Center for the Evaluation of Language and Communication Technologies). The aim of CAT is to make text annotation an intuitive, easy and fast process. In particular, CAT was created to support human annotators in performing linguistic and semantic text annotation and was designed to improve productivity and reduce time spent on this task. Manual text annotation is, in fact, a time-consuming activity, and conflicts may arise with the strict deadlines annotation projects are frequently subject to. Thanks to its adaptability and user-friendly interface, CAT can positively contribute to improve time management in annotation project. Further, the tool has a number of features which make it an easy-to-use tool for many types of annotations. Even if the first prototype of CAT has been used to perform temporal and event annotation following the It-TimeML specifications, the tool is general enough to be used for annotating a broad range of linguistic and semantic phenomena. CAT is freely available for research purposes.

2011

pdf
Annotating Events, Temporal Expressions and Relations in Italian: the It-Timeml Experience for the Ita-TimeBank
Tommaso Caselli | Valentina Bartalesi Lenzi | Rachele Sprugnoli | Emanuele Pianta | Irina Prodanof
Proceedings of the 5th Linguistic Annotation Workshop

2008

pdf
Evaluation of Natural Language Tools for Italian: EVALITA 2007
Bernardo Magnini | Amedeo Cappelli | Fabio Tamburini | Cristina Bosco | Alessandro Mazzei | Vincenzo Lombardo | Francesca Bertagna | Nicoletta Calzolari | Antonio Toral | Valentina Bartalesi Lenzi | Rachele Sprugnoli | Manuela Speranza
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

EVALITA 2007, the first edition of the initiative devoted to the evaluation of Natural Language Processing tools for Italian, provided a shared framework where participants’ systems had the possibility to be evaluated on five different tasks, namely Part of Speech Tagging (organised by the University of Bologna), Parsing (organised by the University of Torino), Word Sense Disambiguation (organised by CNR-ILC, Pisa), Temporal Expression Recognition and Normalization (organised by CELCT, Trento), and Named Entity Recognition (organised by FBK, Trento). We believe that the diffusion of shared tasks and shared evaluation practices is a crucial step towards the development of resources and tools for Natural Language Processing. Experiences of this kind, in fact, are a valuable contribution to the validation of existing models and data, allowing for consistent comparisons among approaches and among representation schemes. The good response obtained by EVALITA, both in the number of participants and in the quality of results, showed that pursuing such goals is feasible not only for English, but also for other languages.

2006

pdf
I-CAB: the Italian Content Annotation Bank
B. Magnini | E. Pianta | C. Girardi | M. Negri | L. Romano | M. Speranza | V. Bartalesi Lenzi | R. Sprugnoli
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we present work in progress for the creation of the Italian Content Annotation Bank (I-CAB), a corpus of Italian news annotated with semantic information at different levels. The first level is represented by temporal expressions, the second level is represented by different types of entities (i.e. person, organizations, locations and geo-political entities), and the third level is represented by relations between entities (e.g. the affiliation relation connecting a person to an organization). So far I-CAB has been manually annotated with temporal expressions, person entities and organization entities. As we intend I-CAB to become a benchmark for various automatic Information Extraction tasks, we followed a policy of reusing already available markup languages. In particular, we adopted the annotation schemes developed for the ACE Entity Detection and Time Expressions Recognition and Normalization tasks. As the ACE guidelines have originally been developed for English, part of the effort consisted in adapting them to the specific morpho-syntactic features of Italian. Finally, we have extended them to include a wider range of entities, such as conjunctions.

pdf bib
Multilingual Extension of a Temporal Expression Normalizer using Annotated Corpora
E. Saquete | P. Martínez-Barco | R. Muñoz | M. Negri | M. Speranza | R. Sprugnoli
Proceedings of the Cross-Language Knowledge Induction Workshop