Arlindo Galvão Filho
Also published as: Arlindo Galvao Filho
2025
AKCIT at SemEval-2025 Task 11: Investigating Data Quality in Portuguese Emotion Recognition
Iago Brito
|
Fernanda Farber
|
Julia Dollis
|
Daniel Pedrozo
|
Artur Novais
|
Diogo Silva
|
Arlindo Galvão Filho
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
This paper investigates the impact of data quality and processing strategies on emotion recognition in Brazilian Portuguese (PTBR) texts. We focus on data distribution, linguistic context, and augmentation techniques such as translation and synthetic data generation. To evaluate these aspects, we conduct experiments on the PTBR portion of the BRIGHTER dataset, a manually curated multilingual dataset containing nearly 100,000 samples, of which 4,552 are in PTBR. Our study encompasses both multi-label emotion detection (presence/absence classification) and emotion intensity prediction (0 to 3 scale), following the SemEval 2025 Track 11 setup. Results demonstrate that emotion intensity labels enhance model performance after discretization, and that smaller multilingual models can outperform larger ones in low-resource settings. Our official submission ranked 6th, but further refinements improved our ranking to 3rd, trailing the top submission by only 0.047, reinforcing the significance of a data-centric approach in emotion recognition.
2022
CEIA-NLP at CASE 2022 Task 1: Protest News Detection for Portuguese
Diogo Fernandes
|
Adalberto Junior
|
Gabriel Marques
|
Anderson Soares
|
Arlindo Galvao Filho
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)
This paper summarizes our work on the document classification subtask of Multilingual protest news detection of the CASE @ ACL-IJCNLP 2022 workshok. In this context, we investigate the performance of monolingual and multilingual transformer-based models in low data resources, taking Portuguese as an example and evaluating language models on document classification. Our approach became the winning solution in Portuguese document classification achieving 0.8007 F1 Score on Test set. The experimental results demonstrate that multilingual models achieve best results in scenarios with few dataset samples of specific language, because we can train models using datasets from other languages of the same task and domain.
Search
Fix author
Co-authors
- Iago Brito 1
- Julia Dollis 1
- Fernanda Farber 1
- Diogo Fernandes 1
- Adalberto Junior 1
- show all...