Enrico Palumbo


2022

pdf
Play música alegre: A Large-Scale Empirical Analysis of Cross-Lingual Phenomena in Voice Assistant Interactions
Donato Crisostomi | Alessandro Manzotti | Enrico Palumbo | Davide Bernardi | Sarah Campbell | Shubham Garg
Proceedings of the Massively Multilingual Natural Language Understanding Workshop (MMNLU-22)

Cross-lingual phenomena are quite common in informal contexts like social media, where users are likely to mix their native language with English or other languages. However, few studies have focused so far on analyzing cross-lingual interactions in voice-assistant data, which present peculiar features in terms of sentence length, named entities, and use of spoken language. Also, little attention has been posed to European countries, where English is frequently used as a second language. In this paper, we present a large-scale empirical analysis of cross-lingual phenomena (code-mixing, linguistic borrowing, foreign named entities) in the interactions with a large-scale voice assistant in European countries. To do this, we first introduce a general, highly-scalable technique to generate synthetic mixed training data annotated with token-level language labels and we train two neural network models to predict them. We evaluate the models both on the synthetic dataset and on a real dataset of code-switched utterances, showing that the best performance is obtained by a character convolution based model. The results of the analysis highlight different behaviors between countries, having Italy with the highest ratio of cross-lingual utterances and Spain with a marked preference in keeping Spanish words. Our research, paired to the increase of the cross-lingual phenomena in time, motivates further research in developing multilingual Natural Language Understanding (NLU) models, which can naturally deal with cross-lingual interactions.

2020

pdf
Semantic Diversity for Natural Language Understanding Evaluation in Dialog Systems
Enrico Palumbo | Andrea Mezzalira | Cristina Marco | Alessandro Manzotti | Daniele Amberti
Proceedings of the 28th International Conference on Computational Linguistics: Industry Track

The quality of Natural Language Understanding (NLU) models is typically evaluated using aggregated metrics on a large number of utterances. In a dialog system, though, the manual analysis of failures on specific utterances is a time-consuming and yet critical endeavor to guarantee a high-quality customer experience. A crucial question for this analysis is how to create a test set of utterances that covers a diversity of possible customer requests. In this paper, we introduce the task of generating a test set with high semantic diversity for NLU evaluation in dialog systems and we describe an approach to address it. The approach starts by extracting high-traffic utterance patterns. Then, for each pattern, it achieves high diversity selecting utterances from different regions of the utterance embedding space. We compare three selection strategies based on clustering of utterances in the embedding space, on solving the maximum distance optimization problem and on simple heuristics such as random uniform sampling and popularity. The evaluation shows that the highest semantic and lexicon diversity is obtained by a greedy maximum sum of distance solver in a comparable runtime with the clustering and the heuristics approaches.

2017

pdf
SentiME++ at SemEval-2017 Task 4: Stacking State-of-the-Art Classifiers to Enhance Sentiment Classification
Raphaël Troncy | Enrico Palumbo | Efstratios Sygkounas | Giuseppe Rizzo
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper, we describe the participation of the SentiME++ system to the SemEval 2017 Task 4A “Sentiment Analysis in Twitter” that aims to classify whether English tweets are of positive, neutral or negative sentiment. SentiME++ is an ensemble approach to sentiment analysis that leverages stacked generalization to automatically combine the predictions of five state-of-the-art sentiment classifiers. SentiME++ achieved officially 61.30% F1-score, ranking 12th out of 38 participants.