2025
pdf
bib
abs
Portuguese Automated Fact-checking: Information Retrieval with Claim extraction
Juliana Gomes
|
Eduardo Garcia
|
Arlindo Rodrigues Galvão Filho
Proceedings of the Eighth Fact Extraction and VERification Workshop (FEVER)
Current Portuguese Automated Fact-Checking (AFC) research often relies on datasets lacking integrated external evidence crucial for comprehensive verification. This study addresses this gap by systematically enriching Portuguese misinformation datasets. We retrieve web evidence by simulating user information-seeking behavior, guided by core claims extracted using Large Language Models (LLMs). Additionally, we apply a semi-automated validation framework to enhance dataset reliability.Our analysis reveals that inherent dataset characteristics impact data properties, evidence retrieval, and AFC model performance. While enrichment generally improves detection, its efficacy varies, influenced by challenges such as self-reinforcing online misinformation and API limitations. This work contributes enriched datasets, associating original texts with retrieved evidence and LLM-extracted claims, to foster future evidence-based fact-checking research.The code and enriched data for this study is available at https://github.com/ju-resplande/pt_afc.
2024
pdf
bib
RoBERTaLexPT: A Legal RoBERTa Model pretrained with deduplication for Portuguese
Eduardo Garcia
|
Nadia Silva
|
Felipe Siqueira
|
Juliana Gomes
|
Hidelberg O. Albuquerque
|
Ellen Souza
|
Eliomar Lima
|
André de Carvalho
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
2023
pdf
bib
abs
DeepLearningBrasil@LT-EDI-2023: Exploring Deep Learning Techniques for Detecting Depression in Social Media Text
Eduardo Garcia
|
Juliana Gomes
|
Adalberto Ferreira Barbosa Junior
|
Cardeque Henrique Bittes de Alvarenga Borges
|
Nadia Félix Felipe da Silva
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion
In this paper, we delineate the strategy employed by our team, DeepLearningBrasil, which secured us the first place in the shared task DepSign-LT-EDI@RANLP-2023 with the advantage of 2.4%. The task was to classify social media texts into three distinct levels of depression - “not depressed,” “moderately depressed,” and “severely depressed.” Leveraging the power of the RoBERTa and DeBERTa models, we further pre-trained them on a collected Reddit dataset, specifically curated from mental health-related Reddit’s communities (Subreddits), leading to an enhanced understanding of nuanced mental health discourse. To address lengthy textual data, we introduced truncation techniques that retained the essence of the content by focusing on its beginnings and endings. Our model was robust against unbalanced data by incorporating sample weights into the loss. Cross-validation and ensemble techniques were then employed to combine our k-fold trained models, delivering an optimal solution. The accompanying code is made available for transparency and further development.