Ludovica Pannitto
2026
Survey of Tools for Manual Linguistic Annotation: Supporting Diversity through Interactive Exploration
Ludovica Pannitto | Kaja Dobrovoljc Zor | Bruno Guillaume
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Ludovica Pannitto | Kaja Dobrovoljc Zor | Bruno Guillaume
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Manual annotation tools are core infrastructure for corpus creation, enabling the development of linguistically informed language resources relevant for both linguistic discovery and computational applications. We present a comprehensive survey of 21 tools supporting morphosyntactic and multi-word expression annotation, systematically documenting more than 50 features relevant for annotation workflows—from software architecture and usability to linguistic coverage and annotation scope. The survey results are published as an open dataset and made accessible through an interactive online platform that allows users to filter and compare tools according to their specific needs. Our initial analysis highlights a robust and open ecosystem of annotation tools, but advanced needs for complex and language-independent annotation are inconsistently addressed.
Is Semi-Automatic Transcription Useful in Corpus Creation? Preliminary Considerations on the KIParla Corpus
Martina Simonotti | Ludovica Pannitto | Eleonora Zucchini | Silvia Ballarè | Caterina Mauri
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Martina Simonotti | Ludovica Pannitto | Eleonora Zucchini | Silvia Ballarè | Caterina Mauri
Proceedings of the Fifteenth Language Resources and Evaluation Conference
This paper analyses the implementation of Automatic Speech Recognition (ASR) into the transcription workflow of the KIParla corpus, a resource of spoken Italian. Through a two-phase experiment, 11 expert and novice transcribers produced both manual and ASR-assisted transcriptions of identical audio segments across three different types of conversation, which were subsequently analyzed through a combination of statistical modeling, word-level alignment and a series of annotation-based metrics. Results show that ASR-assisted workflows can increase transcription speed but do not systemically improve accuracy or prosodic annotation quality. Improvements appear to depend on multiple factors, including workflow configuration, conversation type and annotator experience. These findings are therefore yet not generalizable and highlight the complex interplay between transcription expertise, data type and workflow design. Despite current limitations, ASR-assisted transcription, potentially when supported by task-specific fine-tuning, could be integrated into the KIParla transcription workflow to accelerate corpus creation without compromising linguistic and annotation quality. More broadly, this work underscores the potential of semi-automatic transcription for corpus building, especially in complex settings involving multiple speakers and spontaneous, conversational data.
2025
Constraining constructions with WordNet: pros and cons for the semantic annotation of fillers in the Italian Constructicon
Flavio Pisciotta | Ludovica Pannitto | Lucia Busso | Beatrice Bernasconi | Francesca Masini
Proceedings of the 13th Global Wordnet Conference
Flavio Pisciotta | Ludovica Pannitto | Lucia Busso | Beatrice Bernasconi | Francesca Masini
Proceedings of the 13th Global Wordnet Conference
Introducing KIParla Forest: seeds for a UD annotation of interactional syntax
Ludovica Pannitto | Eleonora Zucchini | Silvia Ballarè | Cristina Bosco | Caterina Mauri | Manuela Sanguinetti
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)
Ludovica Pannitto | Eleonora Zucchini | Silvia Ballarè | Cristina Bosco | Caterina Mauri | Manuela Sanguinetti
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)
The present project endeavors to enrich the linguistic resources available for Italian by introducing KIParla Forest, a treebank for the KIParla corpus - an existing and well-known resource for spoken Italian. This article contextualizes the project, describes the treebank creation process and design choices, and highlights future plans for next improvements.
Deriving semantic classes of Italian adjectives via word embeddings: a large-scale investigation
Ivan Lacić | Ludovica Pannitto
Proceedings of the 13th Global Wordnet Conference
Ivan Lacić | Ludovica Pannitto
Proceedings of the 13th Global Wordnet Conference
2024
Did Somebody Say ‘Gest-IT’? A Pilot Exploration of Multimodal Data Management
Ludovica Pannitto | Lorenzo Albanesi | Laura Marion | Federica Martines | Carmelo Caruso | Claudia Bianchini | Francesca Masini | Caterina Mauri
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)
Ludovica Pannitto | Lorenzo Albanesi | Laura Marion | Federica Martines | Carmelo Caruso | Claudia Bianchini | Francesca Masini | Caterina Mauri
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)
The paper presents a pilot exploration of the construction, management and analysis of a multimodal corpus. Through athree-layer annotation that provides orthographic, prosodic, and gestural transcriptions, the gest-IT resource allows oneto investigate the variation of gesture-making patterns in conversations between sighted people and people with visualimpairment. After discussing the transcription methods and technical procedures employed in our study, we will propose aunified CoNLL-U corpus and indicate our future steps.
2023
CALaMo: a Constructionist Assessment of Language Models
Ludovica Pannitto | Aurélie Herbelot
Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023)
Ludovica Pannitto | Aurélie Herbelot
Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023)
This paper presents a novel framework for evaluating Neural Language Models’ linguistic abilities using a constructionist approach. Not only is the usage-based model in line with the un- derlying stochastic philosophy of neural architectures, but it also allows the linguist to keep meaning as a determinant factor in the analysis. We outline the framework and present two possible scenarios for its application.
2021
Teaching NLP with Bracelets and Restaurant Menus: An Interactive Workshop for Italian Students
Ludovica Pannitto | Lucia Busso | Claudia Roberta Combei | Lucio Messina | Alessio Miaschi | Gabriele Sarti | Malvina Nissim
Proceedings of the Fifth Workshop on Teaching NLP
Ludovica Pannitto | Lucia Busso | Claudia Roberta Combei | Lucio Messina | Alessio Miaschi | Gabriele Sarti | Malvina Nissim
Proceedings of the Fifth Workshop on Teaching NLP
Although Natural Language Processing is at the core of many tools young people use in their everyday life, high school curricula (in Italy) do not include any computational linguistics education. This lack of exposure makes the use of such tools less responsible than it could be, and makes choosing computational linguistics as a university degree unlikely. To raise awareness, curiosity, and longer-term interest in young people, we have developed an interactive workshop designed to illustrate the basic principles of NLP and computational linguistics to high school Italian students aged between 13 and 18 years. The workshop takes the form of a game in which participants play the role of machines needing to solve some of the most common problems a computer faces in understanding language: from voice recognition to Markov chains to syntactic parsing. Participants are guided through the workshop with the help of instructors, who present the activities and explain core concepts from computational linguistics. The workshop was presented at numerous outlets in Italy between 2019 and 2020, both face-to-face and online.
A dissemination workshop for introducing young Italian students to NLP
Lucio Messina | Lucia Busso | Claudia Roberta Combei | Alessio Miaschi | Ludovica Pannitto | Gabriele Sarti | Malvina Nissim
Proceedings of the Fifth Workshop on Teaching NLP
Lucio Messina | Lucia Busso | Claudia Roberta Combei | Alessio Miaschi | Ludovica Pannitto | Gabriele Sarti | Malvina Nissim
Proceedings of the Fifth Workshop on Teaching NLP
We describe and make available the game-based material developed for a laboratory run at several Italian science festivals to popularize NLP among young students.
2020
Recurrent babbling: evaluating the acquisition of grammar from limited input data
Ludovica Pannitto | Aurélie Herbelot
Proceedings of the 24th Conference on Computational Natural Language Learning
Ludovica Pannitto | Aurélie Herbelot
Proceedings of the 24th Conference on Computational Natural Language Learning
Recurrent Neural Networks (RNNs) have been shown to capture various aspects of syntax from raw linguistic input. In most previous experiments, however, learning happens over unrealistic corpora, which do not reflect the type and amount of data a child would be exposed to. This paper remedies this state of affairs by training an LSTM over a realistically sized subset of child-directed input. The behaviour of the network is analysed over time using a novel methodology which consists in quantifying the level of grammatical abstraction in the model’s generated output (its ‘babbling’), compared to the language it has been exposed to. We show that the LSTM indeed abstracts new structures as learning proceeds.
Are Word Embeddings Really a Bad Fit for the Estimation of Thematic Fit?
Emmanuele Chersoni | Ludovica Pannitto | Enrico Santus | Alessandro Lenci | Chu-Ren Huang
Proceedings of the Twelfth Language Resources and Evaluation Conference
Emmanuele Chersoni | Ludovica Pannitto | Enrico Santus | Alessandro Lenci | Chu-Ren Huang
Proceedings of the Twelfth Language Resources and Evaluation Conference
While neural embeddings represent a popular choice for word representation in a wide variety of NLP tasks, their usage for thematic fit modeling has been limited, as they have been reported to lag behind syntax-based count models. In this paper, we propose a complete evaluation of count models and word embeddings on thematic fit estimation, by taking into account a larger number of parameters and verb roles and introducing also dependency-based embeddings in the comparison. Our results show a complex scenario, where a determinant factor for the performance seems to be the availability to the model of reliable syntactic information for building the distributional representations of the roles.
2018
MEDEA: Merging Event knowledge and Distributional vEctor Addition
Ludovica Pannitto | Alessandro Lenci
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)
Ludovica Pannitto | Alessandro Lenci
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)
Modelling Italian Construction Flexibility with Distributional Semantics: Are Constructions Enough?
Lucia Busso | Ludovica Pannitto | Alessandro Lenci
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)
Lucia Busso | Ludovica Pannitto | Alessandro Lenci
Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018)
2017
AHyDA: Automatic Hypernym Detection with Feature Augmentation
Ludovica Pannitto | Lavinia Salicchi | Alessandro Lenci
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)
Ludovica Pannitto | Lavinia Salicchi | Alessandro Lenci
Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017)
FA3L at SemEval-2017 Task 3: A ThRee Embeddings Recurrent Neural Network for Question Answering
Giuseppe Attardi | Antonio Carta | Federico Errica | Andrea Madotto | Ludovica Pannitto
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
Giuseppe Attardi | Antonio Carta | Federico Errica | Andrea Madotto | Ludovica Pannitto
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
In this paper we present ThReeNN, a model for Community Question Answering, Task 3, of SemEval-2017. The proposed model exploits both syntactic and semantic information to build a single and meaningful embedding space. Using a dependency parser in combination with word embeddings, the model creates sequences of inputs for a Recurrent Neural Network, which are then used for the ranking purposes of the Task. The score obtained on the official test data shows promising results.
Search
Fix author
Co-authors
- Lucia Busso 4
- Alessandro Lenci 4
- Caterina Mauri 3
- Silvia Ballarè 2
- Claudia Roberta Combei 2
- Aurélie Herbelot 2
- Francesca Masini 2
- Lucio Messina 2
- Alessio Miaschi 2
- Malvina Nissim 2
- Gabriele Sarti 2
- Eleonora Zucchini 2
- Lorenzo Albanesi 1
- Giuseppe Attardi 1
- Beatrice Bernasconi 1
- Claudia Bianchini 1
- Cristina Bosco 1
- Antonio Carta 1
- Carmelo Caruso 1
- Emmanuele Chersoni 1
- Kaja Dobrovoljc 1
- Federico Errica 1
- Bruno Guillaume 1
- Chu-Ren Huang 1
- Ivan Lacić 1
- Andrea Madotto 1
- Laura Marion 1
- Federica Martines 1
- Flavio Pisciotta 1
- Lavinia Salicchi 1
- Manuela Sanguinetti 1
- Enrico Santus 1
- Martina Simonotti 1