2024
pdf
abs
The Services of the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin
Marco Passarotti
|
Francesco Mambrini
|
Giovanni Moretti
Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024
This paper describes three online services designed to ease the tasks of querying and populating the linguistic resources for Latin made interoperable through their publication as Linked Open Data in the LiLa Knowledge Base. As for querying the KB, we present an interface to search the collection of lemmas that represents the core of the Knowledge Base, and an interactive, graphical platform to run queries on the resources currently interlinked. As for populating the KB with new textual resources, we describe a tool that performs automatic tokenization, lemmatization and Part-of-Speech tagging of a raw text in Latin and links its tokens to LiLa.
pdf
abs
The Rise and Fall of Dependency Parsing in Dante Alighieri’s Divine Comedy
Claudia Corbetta
|
Marco Passarotti
|
Giovanni Moretti
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024
In this paper, we conduct parsing experiments on Dante Alighieri’s Divine Comedy, an Old Italian poem composed between 1306-1321 and organized into three Cantiche —Inferno, Purgatorio, and Paradiso. We perform parsing on subsets of the poem using both a Modern Italian training set and sections of the Divine Comedy itself to evaluate under which scenarios parsers achieve higher scores. We find that employing in-domain training data supports better results, leading to an increase of approximately +17% in Unlabeled Attachment Score (UAS) and +25-30% in Labeled Attachment Score (LAS). Subsequently, we provide brief commentary on the differences in scores achieved among subsections of Cantiche, and we conduct experimental parsing on a text from the same period and style as the Divine Comedy.
2023
pdf
abs
Linking the Neulateinische Wortliste to the LiLa Knowledge Base of Interoperable Resources for Latin
Federica Iurescia
|
Eleonora Litta
|
Marco Passarotti
|
Matteo Pellegrini
|
Giovanni Moretti
|
Paolo Ruffolo
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
This paper describes the process of interlinking a lexical resource consisting of a list of more than 20,000 Neo-Latin words with other resources for Latin. The resources are made interoperable thanks to their linking to the anonymous Knowledge Base, which applies Linguistic Linked Open Data practices and data categories to describe and publish on the Web both textual and lexical resources for the Latin language.
2022
pdf
abs
Linking the LASLA Corpus in the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin
Margherita Fantoli
|
Marco Passarotti
|
Francesco Mambrini
|
Giovanni Moretti
|
Paolo Ruffolo
Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference
This paper describes the process of interlinking the 130 Classical Latin texts provided by an annotated corpus developed at the LASLA laboratory with the LiLa Knowledge Base, which makes linguistic resources for Latin interoperable by following the principles of the Linked Data paradigm and making reference to classes and properties of widely adopted ontologies to model the relevant information. After introducing the overall architecture of the LiLa Knowledge Base and the LASLA corpus, the paper details the phases of the process of linking the corpus with the collection of lemmas of LiLa and presents a federated query to exemplify the added value of interoperability of LASLA’s texts with other resources for Latin.
pdf
abs
Overview of the EvaLatin 2022 Evaluation Campaign
Rachele Sprugnoli
|
Marco Passarotti
|
Flavio Massimiliano Cecchini
|
Margherita Fantoli
|
Giovanni Moretti
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages
This paper describes the organization and the results of the second edition of EvaLatin, the campaign for the evaluation of Natural Language Processing tools for Latin. The three shared tasks proposed in EvaLatin 2022, i.,e.,Lemmatization, Part-of-Speech Tagging and Features Identification, are aimed to foster research in the field of language technologies for Classical languages. The shared dataset consists of texts mainly taken from the LASLA corpus. More specifically, the training set includes only prose texts of the Classical period, whereas the test set is organized in three sub-tasks: a Classical sub-task on a prose text of an author not included in the training data, a Cross-genre sub-task on poetic and scientific texts, and a Cross-time sub-task on a text of the 15th century. The results obtained by the participants for each task and sub-task are presented and discussed.
pdf
abs
The Index Thomisticus Treebank as Linked Data in the LiLa Knowledge Base
Francesco Mambrini
|
Marco Passarotti
|
Giovanni Moretti
|
Matteo Pellegrini
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Although the Universal Dependencies initiative today allows for cross-linguistically consistent annotation of morphology and syntax in treebanks for several languages, syntactically annotated corpora are not yet interoperable with many lexical resources that describe properties of the words that occur therein. In order to cope with such limitation, we propose to adopt the principles of the Linguistic Linked Open Data community, to describe and publish dependency treebanks as LLOD. In particular, this paper illustrates the approach pursued in the LiLa Knowledge Base, which enables interoperability between corpora and lexical resources for Latin, to publish as Linguistic Linked Open Data the annotation layers of two versions of a Medieval Latin treebank (the Index Thomisticus Treebank).
2019
pdf
abs
A System to Monitor Cyberbullying based on Message Classification and Social Network Analysis
Stefano Menini
|
Giovanni Moretti
|
Michele Corazza
|
Elena Cabrio
|
Sara Tonelli
|
Serena Villata
Proceedings of the Third Workshop on Abusive Language Online
Social media platforms like Twitter and Instagram face a surge in cyberbullying phenomena against young users and need to develop scalable computational methods to limit the negative consequences of this kind of abuse. Despite the number of approaches recently proposed in the Natural Language Processing (NLP) research area for detecting different forms of abusive language, the issue of identifying cyberbullying phenomena at scale is still an unsolved problem. This is because of the need to couple abusive language detection on textual message with network analysis, so that repeated attacks against the same person can be identified. In this paper, we present a system to monitor cyberbullying phenomena by combining message classification and social network analysis. We evaluate the classification module on a data set built on Instagram messages, and we describe the cyberbullying monitoring user interface.
2017
pdf
abs
The Content Types Dataset: a New Resource to Explore Semantic and Functional Characteristics of Texts
Rachele Sprugnoli
|
Tommaso Caselli
|
Sara Tonelli
|
Giovanni Moretti
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
This paper presents a new resource, called Content Types Dataset, to promote the analysis of texts as a composition of units with specific semantic and functional roles. By developing this dataset, we also introduce a new NLP task for the automatic classification of Content Types. The annotation scheme and the dataset are described together with two sets of classification experiments.
pdf
abs
RAMBLE ON: Tracing Movements of Popular Historical Figures
Stefano Menini
|
Rachele Sprugnoli
|
Giovanni Moretti
|
Enrico Bignotti
|
Sara Tonelli
|
Bruno Lepri
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics
We present RAMBLE ON, an application integrating a pipeline for frame-based information extraction and an interface to track and display movement trajectories. The code of the extraction pipeline and a navigator are freely available; moreover we display in a demonstrator the outcome of a case study carried out on trajectories of notable persons of the XX Century.
2016
pdf
abs
NLP and Public Engagement: The Case of the Italian School Reform
Tommaso Caselli
|
Giovanni Moretti
|
Rachele Sprugnoli
|
Sara Tonelli
|
Damien Lanfrey
|
Donatella Solda Kutzmann
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
In this paper we present PIERINO (PIattaforma per l’Estrazione e il Recupero di INformazione Online), a system that was implemented in collaboration with the Italian Ministry of Education, University and Research to analyse the citizens’ comments given in #labuonascuola survey. The platform includes various levels of automatic analysis such as key-concept extraction and word co-occurrences. Each analysis is displayed through an intuitive view using different types of visualizations, for example radar charts and sunburst. PIERINO was effectively used to support shaping the last Italian school reform, proving the potential of NLP in the context of policy making.
2012
pdf
abs
CAT: the CELCT Annotation Tool
Valentina Bartalesi Lenzi
|
Giovanni Moretti
|
Rachele Sprugnoli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper presents CAT - CELCT Annotation Tool, a new general-purpose web-based tool for text annotation developed by CELCT (Center for the Evaluation of Language and Communication Technologies). The aim of CAT is to make text annotation an intuitive, easy and fast process. In particular, CAT was created to support human annotators in performing linguistic and semantic text annotation and was designed to improve productivity and reduce time spent on this task. Manual text annotation is, in fact, a time-consuming activity, and conflicts may arise with the strict deadlines annotation projects are frequently subject to. Thanks to its adaptability and user-friendly interface, CAT can positively contribute to improve time management in annotation project. Further, the tool has a number of features which make it an easy-to-use tool for many types of annotations. Even if the first prototype of CAT has been used to perform temporal and event annotation following the It-TimeML specifications, the tool is general enough to be used for annotating a broad range of linguistic and semantic phenomena. CAT is freely available for research purposes.
pdf
abs
The IWSLT 2011 Evaluation Campaign on Automatic Talk Translation
Marcello Federico
|
Sebastian Stüker
|
Luisa Bentivogli
|
Michael Paul
|
Mauro Cettolo
|
Teresa Herrmann
|
Jan Niehues
|
Giovanni Moretti
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
We report here on the eighth evaluation campaign organized in 2011 by the IWSLT workshop series. That IWSLT 2011 evaluation focused on the automatic translation of public talks and included tracks for speech recognition, speech translation, text translation, and system combination. Unlike in previous years, all data supplied for the evaluation has been publicly released on the workshop website, and is at the disposal of researchers interested in working on our benchmarks and in comparing their results with those published at the workshop. This paper provides an overview of the IWSLT 2011 evaluation campaign, and describes the data supplied, the evaluation infrastructure made available to participants, and the subjective evaluation carried out.
2011
pdf
Getting Expert Quality from the Crowd for Machine Translation Evaluation
Luisa Bentivogli
|
Marcello Federico
|
Giovanni Moretti
|
Michael Paul
Proceedings of Machine Translation Summit XIII: Papers