Arlindo Galvão Filho
Also published as: Arlindo Galvao Filho
2026
AKCIT at SemEval-2026 Task 13: A Lightweight LightGBM Baseline for Cross-Language Detection of LLM-Generated Code
Rone Brandao Filho | Walcy Santos Rezende Rios | Lucas Neves | Jose Ricardo Fleury Oliveira | Diogo Fernandes | Arlindo Galvão Filho
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Rone Brandao Filho | Walcy Santos Rezende Rios | Lucas Neves | Jose Ricardo Fleury Oliveira | Diogo Fernandes | Arlindo Galvão Filho
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
The widespread use of LLMs in software development has made the detection of machine-generated code a pressing challenge, particularly when models must generalize across programming languages and domains. We present a lightweight, LLM-free pipeline that combines stylometric feature extraction with a LightGBM classifier and explicitly prioritizes structural generalization over deep semantic modeling. Despite its simplicity, the method achieves a Macro F1 of 0.70–0.72, more than doubling the CodeBERT baseline (0.30) in SemEval-2026 Task 13 Subtask A, while operating without GPUs or any fine-tuning.
2022
CEIA-NLP at CASE 2022 Task 1: Protest News Detection for Portuguese
Diogo Fernandes | Adalberto Junior | Gabriel Marques | Anderson Soares | Arlindo Galvao Filho
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)
Diogo Fernandes | Adalberto Junior | Gabriel Marques | Anderson Soares | Arlindo Galvao Filho
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)
This paper summarizes our work on the document classification subtask of Multilingual protest news detection of the CASE @ ACL-IJCNLP 2022 workshok. In this context, we investigate the performance of monolingual and multilingual transformer-based models in low data resources, taking Portuguese as an example and evaluating language models on document classification. Our approach became the winning solution in Portuguese document classification achieving 0.8007 F1 Score on Test set. The experimental results demonstrate that multilingual models achieve best results in scenarios with few dataset samples of specific language, because we can train models using datasets from other languages of the same task and domain.