Marco Rovera
2024
EventNet-ITA: Italian Frame Parsing for Events
Marco Rovera
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
This paper introduces EventNet-ITA, a large, multi-domain corpus annotated full-text with event frames for Italian. Moreover, we present and thoroughly evaluate an efficient multi-label sequence labeling approach for Frame Parsing. Covering a wide range of individual, social and historical phenomena, with more than 53,000 annotated sentences and over 200 modeled frames, EventNet-ITA constitutes the first systematic attempt to provide the Italian language with a publicly available resource for Frame Parsing of events, useful for a broad spectrum of research and application tasks. Our approach achieves a promising 0.9 strict F1-score for frame classification and 0.72 for frame element classification, on top of minimizing computational requirements. The annotated corpus and the frame parsing model are released under open license.
2023
Italian Legislative Text Classification for Gazzetta Ufficiale
Marco Rovera
|
Alessio Palmero Aprosio
|
Francesco Greco
|
Mariano Lucchese
|
Sara Tonelli
|
Antonio Antetomaso
Proceedings of the Natural Legal Language Processing Workshop 2023
This work introduces a novel, extensive annotated corpus for multi-label legislative text classification in Italian, based on legal acts from the Gazzetta Ufficiale, the official source of legislative information of the Italian state. The annotated dataset, which we released to the community, comprises over 363,000 titles of legislative acts, spanning over 30 years from 1988 until 2022. Moreover, we evaluate four models for text classification on the dataset, demonstrating how using only the acts’ titles can achieve top-level classification performance, with a micro F1-score of 0.87. Also, our analysis shows how Italian domain-adapted legal models do not outperform general-purpose models on the task. Models’ performance can be checked by users via a demonstrator system provided in support of this work.
Search