Giuseppe Serra


AILAB-Udine@SMM4H’22: Limits of Transformers and BERT Ensembles
Beatrice Portelli | Simone Scaboro | Emmanuele Chersoni | Enrico Santus | Giuseppe Serra
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This paper describes the models developed by the AILAB-Udine team for the SMM4H’22 Shared Task. We explored the limits of Transformer based models on text classification, entity extraction and entity normalization, tackling Tasks 1, 2, 5, 6 and 10. The main takeaways we got from participating in different tasks are: the overwhelming positive effects of combining different architectures when using ensemble learning, and the great potential of generative models for term normalization.

Generalizing over Long Tail Concepts for Medical Term Normalization
Beatrice Portelli | Simone Scaboro | Enrico Santus | Hooman Sedghamiz | Emmanuele Chersoni | Giuseppe Serra
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Medical term normalization consists in mapping a piece of text to a large number of output classes.Given the small size of the annotated datasets and the extremely long tail distribution of the concepts, it is of utmost importance to develop models that are capable to generalize to scarce or unseen concepts.An important attribute of most target ontologies is their hierarchical structure. In this paper we introduce a simple and effective learning strategy that leverages such information to enhance the generalizability of both discriminative and generative models.The evaluation shows that the proposed strategy produces state-of-the-art performance on seen concepts and consistent improvements on unseen ones, allowing also for efficient zero-shot knowledge transfer across text typologies and datasets.


BERT Prescriptions to Avoid Unwanted Headaches: A Comparison of Transformer Architectures for Adverse Drug Event Detection
Beatrice Portelli | Edoardo Lenzi | Emmanuele Chersoni | Giuseppe Serra | Enrico Santus
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Pretrained transformer-based models, such as BERT and its variants, have become a common choice to obtain state-of-the-art performances in NLP tasks. In the identification of Adverse Drug Events (ADE) from social media texts, for example, BERT architectures rank first in the leaderboard. However, a systematic comparison between these models has not yet been done. In this paper, we aim at shedding light on the differences between their performance analyzing the results of 12 models, tested on two standard benchmarks. SpanBERT and PubMedBERT emerged as the best models in our evaluation: this result clearly shows that span-based pretraining gives a decisive advantage in the precise recognition of ADEs, and that in-domain language pretraining is particularly useful when the transformer model is trained just on biomedical text from scratch.

NADE: A Benchmark for Robust Adverse Drug Events Extraction in Face of Negations
Simone Scaboro | Beatrice Portelli | Emmanuele Chersoni | Enrico Santus | Giuseppe Serra
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Adverse Drug Event (ADE) extraction models can rapidly examine large collections of social media texts, detecting mentions of drug-related adverse reactions and trigger medical investigations. However, despite the recent advances in NLP, it is currently unknown if such models are robust in face of negation, which is pervasive across language varieties. In this paper we evaluate three state-of-the-art systems, showing their fragility against negation, and then we introduce two possible strategies to increase the robustness of these models: a pipeline approach, relying on a specific component for negation detection; an augmentation of an ADE extraction dataset to artificially create negated samples and further train the models. We show that both strategies bring significant increases in performance, lowering the number of spurious entities predicted by the models. Our dataset and code will be publicly released to encourage research on the topic.


Keyphrase Generation with GANs in Low-Resources Scenarios
Giuseppe Lancioni | Saida S.Mohamed | Beatrice Portelli | Giuseppe Serra | Carlo Tasso
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing

Keyphrase Generation is the task of predicting Keyphrases (KPs), short phrases that summarize the semantic meaning of a given document. Several past studies provided diverse approaches to generate Keyphrases for an input document. However, all of these approaches still need to be trained on very large datasets. In this paper, we introduce BeGanKP, a new conditional GAN model to address the problem of Keyphrase Generation in a low-resource scenario. Our main contribution relies in the Discriminator’s architecture: a new BERT-based module which is able to distinguish between the generated and humancurated KPs reliably. Its characteristics allow us to use it in a low-resource scenario, where only a small amount of training data are available, obtaining an efficient Generator. The resulting architecture achieves, on five public datasets, competitive results with respect to the state-of-the-art approaches, using less than 1% of the training data.

Distilling the Evidence to Augment Fact Verification Models
Beatrice Portelli | Jason Zhao | Tal Schuster | Giuseppe Serra | Enrico Santus
Proceedings of the Third Workshop on Fact Extraction and VERification (FEVER)

The alarming spread of fake news in social media, together with the impossibility of scaling manual fact verification, motivated the development of natural language processing techniques to automatically verify the veracity of claims. Most approaches perform a claim-evidence classification without providing any insights about why the claim is trustworthy or not. We propose, instead, a model-agnostic framework that consists of two modules: (1) a span extractor, which identifies the crucial information connecting claim and evidence; and (2) a classifier that combines claim, evidence, and the extracted spans to predict the veracity of the claim. We show that the spans are informative for the classifier, improving performance and robustness. Tested on several state-of-the-art models over the Fever dataset, the enhanced classifiers consistently achieve higher accuracy while also showing reduced sensitivity to artifacts in the claims.


Predicting the Usefulness of Amazon Reviews Using Off-The-Shelf Argumentation Mining
Marco Passon | Marco Lippi | Giuseppe Serra | Carlo Tasso
Proceedings of the 5th Workshop on Argument Mining

Internet users generate content at unprecedented rates. Building intelligent systems capable of discriminating useful content within this ocean of information is thus becoming a urgent need. In this paper, we aim to predict the usefulness of Amazon reviews, and to do this we exploit features coming from an off-the-shelf argumentation mining system. We argue that the usefulness of a review, in fact, is strictly related to its argumentative content, whereas the use of an already trained system avoids the costly need of relabeling a novel dataset. Results obtained on a large publicly available corpus support this hypothesis.


Exploiting and Evaluating a Supervised, Multilanguage Keyphrase Extraction pipeline for under-resourced languages
Marco Basaldella | Muhammad Helmy | Elisa Antolli | Mihai Horia Popescu | Giuseppe Serra | Carlo Tasso
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

This paper evaluates different techniques for building a supervised, multilanguage keyphrase extraction pipeline for languages which lack a gold standard. Starting from an unsupervised English keyphrase extraction pipeline, we implement pipelines for Arabic, Italian, Portuguese, and Romanian, and we build test collections for languages which lack one. Then, we add a Machine Learning module trained on a well-known English language corpus and we evaluate the performance not only over English but on the other languages as well. Finally, we repeat the same evaluation after training the pipeline over an Arabic language corpus to check whether using a language-specific corpus brings a further improvement in performance. On the five languages we analyzed, results show an improvement in performance when using a machine learning algorithm, even if such algorithm is not trained and tested on the same language.