Adriana Silvina Pagano


2025

pdf bib
Communicating urgency to prevent environmental damage: insights from a linguistic analysis of the WWF24 multilingual corpus
Cristina Bosco | Adriana Silvina Pagano | Elisa Chierchiello
Proceedings of the 1st Workshop on Ecology, Environment, and Natural Language Processing (NLP4Ecology2025)

Contemporary environmental discourse focuses on effectively communicating ecological vulnerability to raise public awareness and encourage positive actions. Hence there is a need for studies to support accurate and adequate discourse production, both by humans and computers. Two main challenges need to be tackled. On the one hand, the language used to communicate about environment issues can be very complex for human and automatic analysis, there being few resources to train and test NLP tools. On the other hand, in the current international scenario, most texts are written in multiple languages or translated from a major to minor language, resulting in different meanings in different languages and cultural contexts. This paper presents a novel parallel corpus comprising the text of World Wide Fund (WWF) 2024 Annual Report in English and its translations into Italian and Brazilian Portuguese, and analyses their linguistic features.

pdf bib
TreEn: A Multilingual Treebank Project on Environmental Discourse
Adriana Silvina Pagano | Patricia Chiril | Elisa Chierchiello | Cristina Bosco
Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)

The increasing complexity of environmental discourse is directly proportional to the growing complexity of environmental debates present today in all communication media. While linguistic and communication studies have been pursued on this discourse, the development of computational linguistic tools and resources dedicated to support its analysis and interpretation is still very incipient. For one, no morphosyntactic resources specific to the environmental domain can be found on major platforms and repositories. This paper introduces TreEn, a multilingual treebank project in progress which compiles texts on environmental discourse produced in different conversational and communication contexts. In particular, it reports on the parallel component of the project and discusses issues faced during sentence-level alignment between original and translated texts, annotation of texts following UD guidelines, and labeling entities drawing on an ontology of environmental-related topics. This novel resource is expected to support environmental discourse analysis by providing morphological and syntactical data to enable cross-language and cross-cultural comparison based on the semantics of the entities annotated in the treebank.

pdf bib
Extending the Enhanced Universal Dependencies – addressing subjects in pro-drop languages
Magali Sanches Duran | Elvis A. de Souza | Maria das Graças Volpe Nunes | Adriana Silvina Pagano | Thiago A. S. Pardo
Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)

Enhanced Universal Dependencies (EUD) serve as a crucial link between syntax and semantics. Beyond basic syntactic dependencies, EUD provides valuable refined logical connections for downstream tasks such as semantic role labeling, coreference resolution, information extraction, and question answering. The original EUD framework defines six types of relationships, but this paper introduces an extension designed to address subject propagation in pro-drop languages. This “Extended EUD” proposal increases the number of relationships that may be annotated in sentences, improving linguistic representation. Additionally, we report our experiments on a corpus of Portuguese (a pro-drop language), which we make publicly available to the research community.

2024

pdf bib
Explaining the Hardest Errors of Contextual Embedding Based Classifiers
Claudio Moisés Valiense De Andrade | Washington Cunha | Guilherme Fonseca | Ana Clara Souza Pagano | Luana De Castro Santos | Adriana Silvina Pagano | Leonardo Chaves Dutra Da Rocha | Marcos André Gonçalves
Proceedings of the 28th Conference on Computational Natural Language Learning

We seek to explain the causes of the misclassification of the most challenging documents, namely those that no classifier using state-of-the-art, very semantically-separable contextual embedding representations managed to predict accurately. To do so, we propose a taxonomy of incorrect predictions, which we used to perform qualitative human evaluation. We posed two (research) questions, considering three sentiment datasets in two different domains – movie and product reviews. Evaluators with two different backgrounds evaluated documents by comparing the predominant sentiment assigned by the model to the label in the gold dataset in order to decide on a likely misclassification reason. Based on a high inter-evaluator agreement (81.7%), we observed significant differences between the product and movie review domains, such as the prevalence of ambivalence in product reviews and sarcasm in movie reviews. Our analysis also revealed an unexpectedly high rate of incorrect labeling in the gold dataset (up to 33%) and a significant amount of incorrect prediction by the model due to a series of linguistic phenomena (including amplified words, contrastive markers, comparative sentences, and references to world knowledge). Overall, our taxonomy and methodology allow us to explain between 80%-85% of the errors with high confidence (agreement) – enabling us to point out where future efforts to improve models should be concentrated.

pdf bib
Toxic Content Detection in online social networks: a new dataset from Brazilian Reddit Communities
Luiz Henrique Quevedo Lima | Adriana Silvina Pagano | Ana Paula Couto da Silva
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1