Ludovica Pannitto

2026

The paper proposes annotation guidelines for syntactic dependencies that span across speaker turns — including collaborative coconstructions proper, wh-question answers, and backchannels — in spoken language treebanks within the Universal Dependencies framework. Two representations are proposed: a speaker-based representation following the segmentation into speech turns, and a dependency-based representation with dependencies across speech turns. New propositions are also put forward to distinguish between reformulations and repairs, and to promote elements in unfinished phrases.

bib abs

Say Again? The Limits of Whisper with Conversation. A Case Study on the KIParla Corpus.
Martina Simonotti | Ludovica Pannitto | Caterina Mauri | Adriano Ferraresi | Gabriele Carioli
Proceedings of Speech Language Models in Low-Resource Settings: Performance, Evaluation, and Bias Analysis (SPEAKABLE) @ LREC 2026

This study investigates how Whisper handles interactional phenomena in spontaneous Italian conversation, focusing on backchannels, repairs, and filled pauses. We compare standard Word Error Rate (WER) optimization with a decoding strategy that explicitly rewards the preservation of interactional events. Results show that decoding choices have limited impact on overall accuracy, while recognition remains strongly phenomenon-dependent, suggesting structural limitations in the handling of interactional phenomena, with systematic linearization of repairs and frequent suppression of short conversational items.

bib abs

Analyzing Environmental Discourse through Construction-Based Pattern Extraction
Elisa Chierchiello | Eliana Di Palma | Ludovica Pannitto | Cristina Bosco
Proceedings of the 2nd Workshop on Ecology, Environment, and Natural Language Processing

Environmental issues are at the centre of a debate currently taking place across all communication channels. This paper provides an analysis of texts in which these issues are discussed, with the novelty of applying a methodology that enables the extraction and comparison of different narratives and points of view. The texts used in this study are the English Living Planet Reports published biennially by the WWF from 2014 to 2024. The methodology is based on the extraction of constructions – patterns collected in the English constructicon CASA – which allow us to identify differences in the presentation of the issues discussed in the analysed texts. Our results show that this methodology can be very helpful in the comparative analysis of texts to reveal different perspectives, for example, to observe diachronic variations.

bib abs

Survey of Tools for Manual Linguistic Annotation: Supporting Diversity through Interactive Exploration
Ludovica Pannitto | Kaja Dobrovoljc Zor | Bruno Guillaume
Proceedings of the Fifteenth Language Resources and Evaluation Conference

Manual annotation tools are core infrastructure for corpus creation, enabling the development of linguistically informed language resources relevant for both linguistic discovery and computational applications. We present a comprehensive survey of 21 tools supporting morphosyntactic and multi-word expression annotation, systematically documenting more than 50 features relevant for annotation workflows—from software architecture and usability to linguistic coverage and annotation scope. The survey results are published as an open dataset and made accessible through an interactive online platform that allows users to filter and compare tools according to their specific needs. Our initial analysis highlights a robust and open ecosystem of annotation tools, but advanced needs for complex and language-independent annotation are inconsistently addressed.

bib abs

Is Semi-Automatic Transcription Useful in Corpus Creation? Preliminary Considerations on the KIParla Corpus
Martina Simonotti | Ludovica Pannitto | Eleonora Zucchini | Silvia Ballarè | Caterina Mauri
Proceedings of the Fifteenth Language Resources and Evaluation Conference

This paper analyses the implementation of Automatic Speech Recognition (ASR) into the transcription workflow of the KIParla corpus, a resource of spoken Italian. Through a two-phase experiment, 11 expert and novice transcribers produced both manual and ASR-assisted transcriptions of identical audio segments across three different types of conversation, which were subsequently analyzed through a combination of statistical modeling, word-level alignment and a series of annotation-based metrics. Results show that ASR-assisted workflows can increase transcription speed but do not systemically improve accuracy or prosodic annotation quality. Improvements appear to depend on multiple factors, including workflow configuration, conversation type and annotator experience. These findings are therefore yet not generalizable and highlight the complex interplay between transcription expertise, data type and workflow design. Despite current limitations, ASR-assisted transcription, potentially when supported by task-specific fine-tuning, could be integrated into the KIParla transcription workflow to accelerate corpus creation without compromising linguistic and annotation quality. More broadly, this work underscores the potential of semi-automatic transcription for corpus building, especially in complex settings involving multiple speakers and spontaneous, conversational data.

bib

’Layer su Layer’: Identifying and Disambiguating the Italian NPN Construction in BERT’s family
Greta Gorzoni | Ludovica Pannitto | Francesca Masini
Proceedings of the 15th Workshop on Cognitive Modeling and Computational Linguistics

2025

pdf bib

Constraining constructions with WordNet: pros and cons for the semantic annotation of fillers in the Italian Constructicon
Flavio Pisciotta | Ludovica Pannitto | Lucia Busso | Beatrice Bernasconi | Francesca Masini
Proceedings of the 13th Global Wordnet Conference

pdf bib

Deriving semantic classes of Italian adjectives via word embeddings: a large-scale investigation
Ivan Lacić | Ludovica Pannitto
Proceedings of the 13th Global Wordnet Conference

pdf bib abs

Introducing KIParla Forest: seeds for a UD annotation of interactional syntax
Ludovica Pannitto | Eleonora Zucchini | Silvia Ballarè | Cristina Bosco | Caterina Mauri | Manuela Sanguinetti
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)

The present project endeavors to enrich the linguistic resources available for Italian by introducing KIParla Forest, a treebank for the KIParla corpus - an existing and well-known resource for spoken Italian. This article contextualizes the project, describes the treebank creation process and design choices, and highlights future plans for next improvements.

2024

pdf bib abs

The paper presents a pilot exploration of the construction, management and analysis of a multimodal corpus. Through athree-layer annotation that provides orthographic, prosodic, and gestural transcriptions, the gest-IT resource allows oneto investigate the variation of gesture-making patterns in conversations between sighted people and people with visualimpairment. After discussing the transcription methods and technical procedures employed in our study, we will propose aunified CoNLL-U corpus and indicate our future steps.

2023

pdf bib abs

CALaMo: a Constructionist Assessment of Language Models
Ludovica Pannitto | Aurélie Herbelot
Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023)

This paper presents a novel framework for evaluating Neural Language Models’ linguistic abilities using a constructionist approach. Not only is the usage-based model in line with the un- derlying stochastic philosophy of neural architectures, but it also allows the linguist to keep meaning as a determinant factor in the analysis. We outline the framework and present two possible scenarios for its application.

2021

pdf bib abs

We describe and make available the game-based material developed for a laboratory run at several Italian science festivals to popularize NLP among young students.

pdf bib abs

Although Natural Language Processing is at the core of many tools young people use in their everyday life, high school curricula (in Italy) do not include any computational linguistics education. This lack of exposure makes the use of such tools less responsible than it could be, and makes choosing computational linguistics as a university degree unlikely. To raise awareness, curiosity, and longer-term interest in young people, we have developed an interactive workshop designed to illustrate the basic principles of NLP and computational linguistics to high school Italian students aged between 13 and 18 years. The workshop takes the form of a game in which participants play the role of machines needing to solve some of the most common problems a computer faces in understanding language: from voice recognition to Markov chains to syntactic parsing. Participants are guided through the workshop with the help of instructors, who present the activities and explain core concepts from computational linguistics. The workshop was presented at numerous outlets in Italy between 2019 and 2020, both face-to-face and online.

2020

pdf bib abs

Are Word Embeddings Really a Bad Fit for the Estimation of Thematic Fit?
Emmanuele Chersoni | Ludovica Pannitto | Enrico Santus | Alessandro Lenci | Chu-Ren Huang
Proceedings of the Twelfth Language Resources and Evaluation Conference

While neural embeddings represent a popular choice for word representation in a wide variety of NLP tasks, their usage for thematic fit modeling has been limited, as they have been reported to lag behind syntax-based count models. In this paper, we propose a complete evaluation of count models and word embeddings on thematic fit estimation, by taking into account a larger number of parameters and verb roles and introducing also dependency-based embeddings in the comparison. Our results show a complex scenario, where a determinant factor for the performance seems to be the availability to the model of reliable syntactic information for building the distributional representations of the roles.

pdf bib abs

Recurrent babbling: evaluating the acquisition of grammar from limited input data
Ludovica Pannitto | Aurélie Herbelot
Proceedings of the 24th Conference on Computational Natural Language Learning

Recurrent Neural Networks (RNNs) have been shown to capture various aspects of syntax from raw linguistic input. In most previous experiments, however, learning happens over unrealistic corpora, which do not reflect the type and amount of data a child would be exposed to. This paper remedies this state of affairs by training an LSTM over a realistically sized subset of child-directed input. The behaviour of the network is analysed over time using a novel methodology which consists in quantifying the level of grammatical abstraction in the model’s generated output (its ‘babbling’), compared to the language it has been exposed to. We show that the LSTM indeed abstracts new structures as learning proceeds.

2018

pdf bib

MEDEA: Merging Event knowledge and Distributional vEctor Addition
Ludovica Pannitto | Alessandro Lenci
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

pdf bib

Modelling Italian Construction Flexibility with Distributional Semantics: Are Constructions Enough?
Lucia Busso | Ludovica Pannitto | Alessandro Lenci
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)

2017

pdf bib abs

FA3L at SemEval-2017 Task 3: A ThRee Embeddings Recurrent Neural Network for Question Answering
Giuseppe Attardi | Antonio Carta | Federico Errica | Andrea Madotto | Ludovica Pannitto
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper we present ThReeNN, a model for Community Question Answering, Task 3, of SemEval-2017. The proposed model exploits both syntactic and semantic information to build a single and meaningful embedding space. Using a dependency parser in combination with word embeddings, the model creates sequences of inputs for a Recurrent Neural Network, which are then used for the ranking purposes of the Task. The score obtained on the official test data shows promising results.

pdf bib

AHyDA: Automatic Hypernym Detection with Feature Augmentation
Ludovica Pannitto | Lavinia Salicchi | Alessandro Lenci
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)

Search Fix author