Zouhir Essalmani

2026

NLP-FSDM at SemEval-2026 Task 4: Narrative Similarity via Multiple Negatives Ranking and Instruction-Based Embeddings
Abdessamad Benlahbib | Zouhir Essalmani | Achraf Boumhidi | Anass Fahfouh | Hamza Alami
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

The identification of narrative similarity is a complex NLP challenge that requires modeling deeper plot and thematic alignment rather than relying solely on lexical overlap. In this paper, we detail the participation of team NLP-FSDM in SemEval-2026 Task 4. Our approach utilizes the bge-large-en-v1.5 encoder. For Track A, we fine-tune it using Multiple Negatives Ranking Loss (MNRL), while for Track B we rely on the pretrained encoder to generate fixed narrative representations. We achieved an accuracy of 65.50% in Track A and 62.50% in Track B. This paper provides an extensive comparison of our results with competitive baselines and top-performing systems, analyzing the efficacy of dense encoders in low-resource narrative contexts.

pdf bib abs

NLP-FSDM at SemEval-2026 Task 2: Temporal Smoothing and CCC-MAE Optimization for Balanced Longitudinal Affect Assessment
Abdessamad Benlahbib | Zouhir Essalmani | Achraf Boumhidi | Anass Fahfouh | Hamza Alami
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

This paper describes the NLP-FSDM system for SemEval-2026 Task 2, Subtask 1 on longitudinal affect assessment. The task requires predicting Valence and Arousal (V & A) scores for sequences of ecological essays and feeling words written over time. We adopt ModernBERT-large as a text encoder and formulate the task as a joint regression problem optimized using a Concordance Correlation Coefficient (CCC) loss combined with a lightly weighted Mean Absolute Error (MAE) term. To reduce variance induced by fine-tuning large transformers on relatively small user-specific datasets, we employ a three-seed ensemble. Finally, we introduce a lightweight post-inference temporal smoothing mechanism applied per user to improve within-user consistency. Our system achieves an rcomposite of 0.546 for Valence and 0.453 for Arousal, demonstrating stable cross-dimensional performance without explicitly modeling sequential dependencies.

Co-authors

Venues

SemEval2
WS2

Fix author