AKCIT at SemEval-2025 Task 11: Investigating Data Quality in Portuguese Emotion Recognition

Iago Brito, Fernanda Farber, Julia Dollis, Daniel Pedrozo, Artur Novais, Diogo Silva, Arlindo Galvão Filho


Abstract
This paper investigates the impact of data quality and processing strategies on emotion recognition in Brazilian Portuguese (PTBR) texts. We focus on data distribution, linguistic context, and augmentation techniques such as translation and synthetic data generation. To evaluate these aspects, we conduct experiments on the PTBR portion of the BRIGHTER dataset, a manually curated multilingual dataset containing nearly 100,000 samples, of which 4,552 are in PTBR. Our study encompasses both multi-label emotion detection (presence/absence classification) and emotion intensity prediction (0 to 3 scale), following the SemEval 2025 Track 11 setup. Results demonstrate that emotion intensity labels enhance model performance after discretization, and that smaller multilingual models can outperform larger ones in low-resource settings. Our official submission ranked 6th, but further refinements improved our ranking to 3rd, trailing the top submission by only 0.047, reinforcing the significance of a data-centric approach in emotion recognition.
Anthology ID:
2025.semeval-1.300
Volume:
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2305–2310
Language:
URL:
https://preview.aclanthology.org/transition-to-people-yaml/2025.semeval-1.300/
DOI:
Bibkey:
Cite (ACL):
Iago Brito, Fernanda Farber, Julia Dollis, Daniel Pedrozo, Artur Novais, Diogo Silva, and Arlindo Galvão Filho. 2025. AKCIT at SemEval-2025 Task 11: Investigating Data Quality in Portuguese Emotion Recognition. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 2305–2310, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
AKCIT at SemEval-2025 Task 11: Investigating Data Quality in Portuguese Emotion Recognition (Brito et al., SemEval 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/transition-to-people-yaml/2025.semeval-1.300.pdf