Georgios Arampatzis

2026

DUTH at SemEval-2026 Task 9: Joint Multilingual Fine-Tuning for Online Polarization Detection
Georgios Arampatzis | Avi Arampatzis
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Online polarization on social media presentssubstantial challenges for public discourse, content moderation, and large-scale social analytics across diverse linguistic and cultural contexts. A recent multilingual benchmark enablessystematic evaluation of polarization detectionacross 22 languages and multiple sociopoliticalevents, providing a unified setting for studying socially grounded NLP under multilingualconditions.Wepresent DUTH, a unified multilingual system for binary polarization detection based onjoint fine-tuning of XLM-RoBERTa on the 22languages of SemEval-2026 Task 9 Subtask1. Our system uses a single shared encoderwith a linear classification head and is trainedjointly on the multilingual training set usingmixed-precision optimization. On the officialevaluation, the system achieved an average Accuracy of 0.822 and an average Macro-F1 of0.780 across 22 languages. The results showthat a simple jointly fine-tuned multilingualtransformer provides a competitive and scalable baseline for online polarization detection,while still facing difficulties in implicit, sarcastic, and culturally grounded cases.

pdf bib abs

DUTH at SemEval-2026 Task 3: Multilingual Transformer Models for Dimensional Stance Prediction Across Tracks
Georgios Arampatzis | Avi Arampatzis
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This paper presents DUTH, our system forTrack A and Track B of SemEval-2026 Task 3on Dimensional Sentiment Analysis, focusing on the Dimensional Aspect-Based Sentiment Regression (DimASR) subtask. DimASRrequires predicting continuous Valence andArousal (VA) scores for aspect terms in opinionated text and stance targets in public-issuediscourse.Our approach uses a multilingual Transformerencoder fine-tuned end-to-end to jointly encodethe input text and its corresponding aspect orstance target, followed by a regression head forVAprediction. We evaluate DUTH on the official multilingual and multidomain datasets andcompare it against the shared-task baselines.Results show competitive performance, withimprovements over the strongest official baseline in Track A and over the mBERT baselinein Track B, while yielding consistently strongerpredictions for Valence than for Arousal.

pdf bib abs

DUTH at SemEval-2026 Task 1: Prompt-Based Zero-Shot Large Language Models for Constrained Multilingual Humor Generation
Georgios Arampatzis | Avi Arampatzis
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Humor generation is a challenging problem fornatural language processing systems due to itssubjectivity, cultural dependence, and relianceon creative language use. These challenges arefurther amplified in constrained multilingualsettings, where models must satisfy explicitlexical or topical requirements while producingshort and humorous outputs.In this paper, we present DUTH’s system forSemEval-2026 Task A on constrained multilingual joke generation in English, Spanish, andChinese. Our approach leverages instructiontuned large language models in a zero-shot setting, combining prompt engineering, controlleddecoding, and lightweight post-generation validation to enforce constraint satisfaction andlanguage consistency. We evaluate multiplemodel families and parameter scales, includingQwen and Mistral models. Human evaluationdemonstrates that larger models consistentlyoutperform smaller ones, with Qwen2.5-14BInstruct achieving the strongest overall performance. Error analysis highlights remainingchallenges such as lexical constraint violationsand cross-lingual interference.

Co-authors

Avi Arampatzis 3

Venues

SemEval3
WS3

Fix author