Marcelo Finger

2026

Pretrained Neural Audio Models for Asthma Detection from Voice and Speech
Leticia Puttlitz Boll | Antonio Oss Boll | Yan Anderson Pires de Oliveira | Victor dos Santos Silva | Mariana Lopes Pestana | Celso Ricardo Fernandes de Carvalho | Marcelo Matheus Gauy | Marcelo Finger
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2

Asthma is a chronic respiratory disease that affects breathing and may also influence speech and voice production. In this paper, we examine whether short mobile-recorded Brazilian Portuguese voice and speech audio contain cues that can be used to distinguish individuals with asthma from those without asthma. We approach this problem using transfer learning with pretrained neural audio models based on convolutional architectures trained on large-scale audio datasets (PANNs). We evaluate two recording types: sustained vowel phonation and read speech. Models are trained for a binary classification task and evaluated at both the segment level and the patient level. Read speech performs better than sustained vowels. The best configuration (CNN14 on speech) achieves 0.85 patient-level balanced accuracy (accuracy 0.85) with ROC-AUC 0.93 and PR-AUC 0.98, performing comparably to CNN10. Training from scratch performs worse than fine-tuning a pretrained model, showing that pretraining helps when data is limited. Performance also varies across age groups, suggesting demographic sensitivity. These findings support the feasibility of audio-based asthma classification from voice and speech and motivate further investigation of pretrained audio models in biomedical applications.

pdf bib abs

Compression-based Language Complexity under Register Variation in Portuguese
Felipe Ribas Serras | Marcelo Finger
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

Compression-based language complexity metrics show promise as holistic parameters for measuring linguistic complexity across intra- and cross-linguistic scenarios. Yet, their sensitivity to specific forms of linguistic variation requires further experimental validation. We examine the sensitivity of this metric family to register variation in Portuguese, a phenomenon already established for English. We refine the validation process found in previous literature by introducing a more granular statistical analysis to evaluate both the individual and joint sensitivity of these metrics to register variation at the sentence level. Our results confirm they are highly sensitive to functional variation in Portuguese, exhibiting the same structural morphosyntactic trade-off consistent with that observed in English and in cross-linguistic studies.

pdf bib abs

FlexQwen: Exploring Hybrid Objectives and Text Originality for Portuguese
Miguel de Mello Carpi | Marcelo Finger
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1

While scaling laws suggest increasing model and dataset sizes for better results, efficient pre-training techniques for low-resource scenarios present unique challenges that require further investigation. This work introduces FlexQwen, a model based on the Qwen 3 architecture adapted for a hybrid causal-masked objective, and the Carolina Originality dataset, a subset of the Corpus Carolina tailored for efficient pre-training in Portuguese. We investigate two primary research questions: the influence of hybrid masked-causal modelling and the impact of text originality on model performance. Our experiments compare a high-originality Gold split against a length-matched control group. Results indicate that hybrid objectives may be viable for efficient training. Furthermore, we provide open access to our code, datasets, and training logs to foster further research in efficient Portuguese LLMs.

2024

pdf bib abs

Analysing and Validating Language Complexity Metrics Across South American Indigenous Languages
Felipe Serras | Miguel Carpi | Matheus Branco | Marcelo Finger
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Language complexity is an emerging concept critical for NLP and for quantitative and cognitive approaches to linguistics. In this work, we evaluate the behavior of a set of compression-based language complexity metrics when applied to a large set of native South American languages. Our goal is to validate the desirable properties of such metrics against a more diverse set of languages, guaranteeing the universality of the techniques developed on the basis of this type of theoretical artifact. Our analysis confirmed with statistical confidence most propositions about the metrics studied, affirming their robustness, despite showing less stability than when the same metrics were applied to Indo-European languages. We also observed that the trade-off between morphological and syntactic complexities is strongly related to language phylogeny.

pdf bib

Exploring Computational Discernibility of Discourse Domains in Brazilian Portuguese within the Carolina Corpus
Felipe Ribas Serras | Mariana Sturzeneker | Miguel de Mello Carpi | Mayara Feliciano Palma | Maria Clara Ramos Morales Crespo | Aline Silva Costa | Vanessa Martins Do Monte | Cristiane Namiuti | Maria Clara Paixão de Souza | Marcelo Finger
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1

2021

pdf bib

Audio MFCC-gram Transformers for respiratory insufficiency detection in COVID-19
Marcelo Gauy | Marcelo Finger
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology

pdf bib

verBERT: Automating Brazilian Case Law Document Multi-label Categorization Using BERT
Felipe Serras | Marcelo Finger
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology

pdf bib

2019

pdf bib abs

A logical-based corpus for cross-lingual evaluation
Felipe Salvatore | Marcelo Finger | Roberto Hirata Jr
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

At present, different deep learning models are presenting high accuracy on popular inference datasets such as SNLI, MNLI, and SciTail. However, there are different indicators that those datasets can be exploited by using some simple linguistic patterns. This fact poses difficulties to our understanding of the actual capacity of machine learning models to solve the complex task of textual inference. We propose a new set of syntactic tasks focused on contradiction detection that require specific capacities over linguistic logical forms such as: Boolean coordination, quantifiers, definite description, and counting operators. We evaluate two kinds of deep learning models that implicitly exploit language structure: recurrent models and the Transformer network BERT. We show that although BERT is clearly more efficient to generalize over most logical forms, there is space for improvement when dealing with counting operators. Since the syntactic tasks can be implemented in different languages, we show a successful case of cross-lingual transfer learning between English and Portuguese.