Fabio Zammit
2026
SentiMalti: A Maltese Sentiment Analysis Dataset and Models
Ian Caruana | Matthew Vella | Fabio Zammit | Kurt Micallef | Claudia Borg
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Ian Caruana | Matthew Vella | Fabio Zammit | Kurt Micallef | Claudia Borg
Proceedings of the Fifteenth Language Resources and Evaluation Conference
We present SentiMalti, a new Maltese social media sentiment resource and accompanying baselines. We scrape user-generated content from YouTube, Reddit, and Facebook, then apply a Maltese-aware preprocessing pipeline (cleaning, personally identifiable information anonymisation, sentence splitting, and sentence-level language filtering) to retain Maltese sentences while tolerating realistic code-switching. The resulting crowdsourced dataset contains 2,327 sentences annotated for positive (39%), negative (31%), and neutral (30%) sentiment. We integrate prior Maltese datasets to create a combined benchmark of 3,772 instances. We evaluate fine-tuned encoder models (BERTu, Glot500) and few-shot prompting with instruction-tuned multilingual LLMs (Aya-101, Gemma 2 Instruct 9B). On the full test set, five-shot Aya-101 attains 68.65 macro-F1, closely followed by a fine-tuned BERTu at 68.36 macro-F1. Error analysis reveals complementary strengths: BERTu better separates polarised classes, while Aya-101 tends to over-predict the neutral class. We release the dataset splits, code, and a fine-tuned BERTu model to facilitate further work in Maltese NLP and sentiment analysis.