Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)

Ali Hürriyetoğlu, Surendrabikram Thapa, Hristo Tanev (Editors)

Anthology ID:: 2026.eeuca-1
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Venues:: EEUCA | WS
Events:: Annual Meeting of the Association for Computational Linguistics (2026) | Workshop on Event Extraction and Understanding: Challenges and Applications (2026) | Other Workshops and Events (2026)
SIG:
Publisher:: Association for Computational Linguistics
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.eeuca-1/
DOI:
ISBN:: 979-8-89176-402-6
Bib Export formats:: BibTeX
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.eeuca-1.pdf

Proceedings of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026)
Ali Hürriyetoğlu | Surendrabikram Thapa | Hristo Tanev

pdf bib abs

Overview of the Workshop on Event Extraction and Understanding: Challenges and Applications
Ali Hürriyetoğlu | Surendrabikram Thapa | Hristo Tanev | Laxmi Thapa | Surabhi Adhikari

This paper presents an overview of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026), held in conjunction with ACL 2026. Formerly known as CASE, the workshop continues its mission of bringing together researchers from natural language processing, machine learning, computational social science, and related disciplines to advance research on event extraction and understanding. This year’s edition particularly emphasized the growing influence of large language models (LLMs), multimodal learning, and weakly supervised methodologies in event extraction research. The workshop featured six regular research papers covering topics such as low-resource event extraction, reflective multi-agent architectures, symbolic auditing of procedural events, geopolitical event extraction, and generative event extraction strategies. In addition, EEUCA 2026 hosted two shared tasks focusing on toxicity detection in gaming communities and multimodal vaccine-critical meme analysis, attracting broad international participation and encouraging research on socially impactful applications of AI. The workshop highlights current advances, emerging challenges, and future directions in multilingual, multimodal, and socially aware event extraction systems.

pdf bib abs

Online gaming communities are increasingly affected by toxic communication, including harassment, threats, hate speech, and extremist content. Detecting such behavior is challenging due to the short, noisy, multilingual, and highly imbalanced nature of gaming chat data. To advance research in this area, we organized the Shared Task on Fine-Grained Toxicity Detection in Online Gaming at EEUCA 2026, co-located with ACL 2026. The task is based on the GameTox dataset, containing approximately 53,000 annotated chat utterances from World of Tanks across six toxicity categories. A total of 102 participants took part, and 35 teams submitted systems exploring approaches such as domain-adaptive pretraining, multilingual transfer learning, contrastive learning, LLM-based augmentation, and ensemble methods. Systems were evaluated using macro-averaged F1-score, with the top system achieving 0.7041 Macro F1. This paper presents an overview of the shared task, dataset, evaluation framework, participant methods, and key findings.

pdf bib abs

Vaccination-related memes on social media play an increasingly influential role in shaping public perception of immunization, often spreading both supportive messaging and vaccine-critical narratives through multimodal communication. Detecting such content is challenging due to the combined use of images, embedded text, sarcasm, humor, and cultural references. This paper presents an overview of the Shared Task on Multimodal Identification of Vaccine Critical Content on Social Media, organized as part of the 9th Workshop on Event Extraction and Understanding: Challenges and Applications (EEUCA 2026) at ACL 2026. The task is based on the VaxMeme dataset, a large-scale collection of vaccination-related memes annotated into three classes: Vaccine-critical, Neutral, and Pro-vaccine. A total of 77 participants registered for the competition, with 25 teams submitting systems for evaluation. Participating approaches included transformer-based multimodal architectures, vision-language models, ensemble methods, and instruction-tuned large language models. The best-performing system achieved a macro F1-score of 0.8494. This shared task provides insights into the strengths and limitations of current multimodal approaches for vaccine stance detection and highlights future directions for robust public health misinformation analysis.

pdf bib abs

Constructing a Silver Corpus for Weakly Supervised Vietnamese Event Extraction using Cross-Document N-ary Relation Filtering
Phạm Xuân Hiệu | Tuan Vu Minh | Mai-Vu Tran | Hoang-Quynh Le

Event extraction for low-resource languages such as Vietnamese is limited by the lack of large-scale annotated data. To address this, we propose a weakly supervised framework that constructs a silver corpus via pseudo-labeling. We introduce a cross-document n-ary relation filtering strategy to reduce noise by leveraging consistency across multiple articles describing the same event, and further enhance data diversity with schema-based augmentation. Experiments on the BKEE benchmark show consistent improvements, demonstrating the effectiveness of our approach. Data is available at: https://github.com/Larken1612/VietEE2.

pdf bib abs

When Tasks Share Structure: A Comparative Study of Training Strategies for Generative Event Extraction
Rishi Ravikumar | Riza Batista-Navarro

Event extraction requires performing two interdependent subtasks: event detection and event argument extraction. While prior work has explored pipelined and joint training approaches, the question of how best to coordinate training across these subtasks in generative LLM-based systems remains open. We present a systematic study comparing three training paradigms: disjoint, fully shared and hybrid weight allocation, instantiated as eight concrete strategies and evaluated on ACE2005 and RichERE across multiple instruction-tuned LLMs. Our findings show that training strategy has a consistent and meaningful effect on extraction accuracy, and that a clear best-performing strategy emerges across models and benchmarks. We believe that these findings could extend beyond event extraction to other information extraction tasks that decompose into interdependent subtasks.

pdf bib abs

A Qualia-Based Audit of Procedural Event Annotations
Kyeongmin Rim | Marc Verhagen | James Pustejovsky

Procedural event annotations record *what changed* but not the semantic relevance or grounding of the change: whether the annotated entity is the kind of thing whose state matters for the domain.We present Entity Qualia Structure (EQS), a per-entity sortal-type categorization (coarsened from Generative Lexicon’s type system to three categories: natural, artifactual, instrument) extracted from existing lexical resources.Applied to the OpenPI food domain, EQS reaches 84.7% coverage of the 518-item entity vocabulary; across 9367 transformation annotations, only 51.1% concern food entities themselves, while 30.2% record state changes of instruments, entities whose sortal type places them outside the food-state task.In a three-way comparison against existing cleanup efforts, EQS uniquely flags 15.6% of annotations that neither human re-annotation (OpenPI-C) nor LLM salience scoring (OpenPI 2.0) catches.Analysis of the *agentive* quale reveals that 93% of agentive-positive annotations involve instruments rather than food: entity creation can only be detected when the agentive feature is paired with the associated verb’s event semantics.

pdf bib abs

Research on Event Extraction (EE) in South Asian languages is crucial for understanding information dissemination and enabling automated news analysis in morphologically complex, low-resource environments. To address the scarcity of high-quality, publicly available datasets, we present Nepali Event Extraction (NepEE), a manually annotated corpus comprising 10,226 Devanagari sentences. The dataset includes annotations for trigger spans and event types, achieving high inter-annotator agreement with Fleiss’ kappa = 0.812 for trigger identification and kappa = 0.855 for event classification. Our dataset was developed through a rigorous iterative three-phase protocol involving five expert native speakers to ensure linguistic precision. We conduct benchmarking across a broad spectrum of approaches, including classical feature-based models, five fine-tuned Transformer encoders, and contemporary instruction-tuned Large Language Models (LLMs) using zero-shot and fixed few-shot prompting. Our analysis shows that Indic-specialized Transformers achieve superior classification performance, while traditional methods and few-shot prompting struggle with the challenges of exact span extraction in morphologically complex contexts. Furthermore, we quantify performance differences between sentence-level and span-level tasks, providing strong baselines for future research. The findings and the released NepEE dataset provide a valuable resource for advancing event understanding in low-resource languages (LRLs). All code and resources are available at https://github.com/SUJAL390/EEUCA-ACL-2026-Trigger-Phrase-Identification-and-Event-Classification-in-Low-Resource-Languages.

pdf bib abs

A Self-Reflective LLM-based Architecture for Semi-Open Event Extraction
Hristo Tanev | Michel de Bollivier | Bertrand De Longueville

We present a multi-agent reflective architecture for event extraction based on generativelarge language models (LLMs). Our architecture is the first of its kind to perform Semi-Open Event Extraction (SOEE), a hybrid framework that combines a fixed set of event template fields with dynamically generated attributes. Another novel feature of this system is the self-reflection. This type of LLM-based reasoning is the other novel feature of our system. It is defined as the generation of questions about missing or implicit event information and finding their answers within the system itself. We model event extraction as an iterative dialogue between a reflective LLM based agent, which generates questions to uncover missing event information and a set ofexpert agents, which provide domain-aware answers to these questions. The expert agents alsogenerate the initial event template using a generative LLM. Evaluated in the health domain, our event extraction system shows very promising results, demonstrating that LLM-based reflective multi-agent reasoning can accurately perform event extraction and expand the eventtemplate in a creative and comprehensive manner

pdf bib abs

GENOME: A New Geopolitical Event Methodology and Dataset using Large Language Models
Alessandro Dell’Orto | Jesse Kommandeur

Quantitative research in International Relations relies heavily on structured event data, yet existing automated datasets lack up-to-date coverage of both conflictual and cooperative interactions. We introduce GENOME (Geopolitical Event News Observatory, Mapping, and Extraction), an automatically extracted dataset that implements PLOVER’s 16 event types and extends its Actor–Recipient schema with a Third Party role to capture multilateral relations from newswire data. GENOME’s pipeline comprises event extraction, ontology-based classification, entity normalization, and deduplication, leveraging GPT models with one-shot prompting and enforced structured outputs. We compare GENOME against POLECAT dataset over a five-month overlap period across event volume, temporal dynamics, and geographical coverage. Results show that while the two datasets align closely on conflict event types, GENOME captures a more balanced distribution of cooperative events, particularly verbal interactions nearly absent in POLECAT. GENOME also demonstrates improved temporal precision by attributing events to their inferred date of occurrence rather than publication date, and effective deduplication of highly covered events.

pdf bib abs

FNLP412@EEUCA 2026: Understanding Toxic Behavioral Intent in Gaming Chat Logs using Transfer Learning and Synthetic Data Augmentation
Mihai Radu Radulescu

Our paper explores several machine learning methods for detecting toxic language in gaming-related chat utterances. We start with the GameTox dataset, perform some data preprocessing and augment the minority classes with LLM-generated synthetic data. We then set a baseline using a classic Logistic Regression model and continue to explore severalapproaches to surpassing it, by leveraging the leading multilingual transformer models (XLM-RoBERTa and DeBERTa-V3) to classify our test data. We achieve a top result of 0.6725 Macro-F1 (2nd place on shared task leaderboard) using a MDeBERTa-V3 model which we pretrained on the Jigsaw dataset for 1 epoch and then fine-tuned on our GameTox data for 5 epochs.

pdf bib abs

wangkongqiang@EEUCA 2026: Understanding Toxic Behavioral Intent in Gaming Chat Logs
Kongqiang Wang | Peng Zhang | Quingli Tan

Our team was interested in content classification and labeling from toxicity detection of gaming chat logs in online gaming communities. We joined the shared task on Understanding Toxic Behavioral Intent in Gaming Chat Logs@EEUCA with ACL 2026. In this task, our goal is to assign a content classification label to player’s utterance (e.g., Hate and Harassment, Threats, Non-toxic). The objective is to develop systems that can classify the intent of a player’s utterance. The dataset for this task will have five labels: Non-toxic (0), Insults and Flaming (1), Other Offensive Texts (2), Hate and Harassment (3), Threats (4) and Extremism (5). The performance will be ranked by F1-score (Macro). The task utilizes 53,000 game chat utterances from World of Tanks. Our group used a supervised learning method on multiple pre-trained models and finetuning Qwen2 LLMs. The best result on the test set for shared task were Macro F1 score of 0.5776, Accuracy 0.9075, Precision (Macro) 0.6847, and Recall (Macro) 0.5343 from fine-tuning qwen2_7B LLM method, ranking 8th among all teams. The complete code of this entire project can be found at our GitHub address.

pdf bib abs

wangkongqiang@EEUCA 2026: Multimodal Identification of Vaccine Critical Content on Social Media
Kongqiang Wang | Peng Zhang | Quingli Tan

Our team was interested in content classification and labeling from multimodal meme detection of vaccine critical content on social media.We joined the shared task on Multimodal Identification of Vaccine Critical Content on Social Media@EEUCA with ACL 2026. In this task,our goal is to assign a content classification label to vaccine-related discourse (e.g., Vaccine critical, Neutral, Pro-vaccine). The objectiveis to develop systems that can classify the intent of a vaccine-related meme. The dataset for this task will have three labels: Vaccine critical (0), Neutral (1), and Pro-vaccine (2). The performance will be ranked by F1-score (Macro). This shared task is based on the VaxMeme dataset, a collection of over 10,000 manually annotated vaccination-related memes, designed to support multimodal vaccine-critical meme detection. Our group used a supervised learning method on finetuning pre-trained models and Large Language Model (LLM), including Qwen2 LLMs and Llama series LLMs based on Llama-Factory. The best result on the test set for shared task were Macro F1 score of 0.8153, Accuracy 0.8185, Precision (Macro) 0.8151, and Recall (Macro) 0.8159 from fine-tuning qwen2_1.5B LLM method, ranking 12th among all teams. The complete code of this entire project can be found at our GitHub address.

pdf bib abs

Quasar@EEUCA 2026: Multimodal Deep Learning for Vaccine Stance Detection in Memes
Adiba Fairooz Chowdhury | MD Sagor Chowdhury

Vaccine stance detection in multimodal memes has emerged as an important yet challenging task, requiring models to interpret both textual and visual cues that jointly convey opinions. The difficulty lies in capturing subtle semantic interactions and handling class imbalance across stance categories. In this paper, we present our system developed for the VaxMeme 2026 Shared Task at EEUCA 2026. Our approach leverages a soft-voting ensemble of complementary models, combining DeBERTa-v3-large and RoBERTa-large for rich textual representation with CLIP (ViT-B/32) for joint vision-language understanding. We incorporate domain-specific preprocessing, techniques such as random token deletion, image enhancement, and balanced class oversampling to address dataset limitations. Through extensive ablation studies, we identify balanced class oversampling as the most effective component, significantly improving performance across models. Our final system achieves a macro F1-score of 0.8306, securing 8th place among 25 teams, demonstrating the effectiveness of ensemble-based multimodal learning for stance detection.

pdf bib abs

CUET_SYNTHETICA@EEUCA 2026: Gated Cross-Modal Attention with Domain-Adapted Text Encoding for Vaccine-Critical Meme Detection
Sumaiya Zaman | Miftahul Jannat Rishta | Shiti Chowdhury

Vaccine-critical memes have emerged as a growing challenge for public health communication, combining images and text to spread misinformation in ways that are difficult to detect automatically. In this paper, we have described our system for the EEUCA 2026 Shared Task on Multimodal Vaccine-Critical Meme Detection, classifying memes from the VaxMeme dataset into Vaccine-Critical, Neutral and Pro-Vaccine categories. We have experimented with multiple text encoders and visual backbones, finding that Twitter-RoBERTa fused with CLIP ViT-L/14 through gated cross-modal attention has achieved a test macro F1 of 0.8357. We have further shown that domain-specific pretraining has outperformed larger general-purpose models, highlighting the importance of domain adaptation over raw model scale. Finally, our system has secured the 3rd position on the shared task leaderboard.

pdf bib abs

wenbin@EEUCA 2026: MoEs-VaxAgent, A Two-Stage Framework for Multimodal Vaccine Critical Meme Detection
Wenbin Shen

Memes on social media have emerged as a crucial medium for disseminating vaccine-related viewpoints, yet their inherent irony, metaphor, and text-image misalignment pose significant challenges to automatic detection. In this paper, we propose MoEs-VaxAgent, a two-stage multimodal framework for vaccine critical meme detection. First, we design a dynamic routing Mixture-of-Experts module capable of adaptively capturing multi-granular semantic cues within memes. Second, to address hard samples located at the decision boundaries, we introduce an uncertainty-aware multi-agent rectification mechanism to perform a secondary detection on samples identified with low confidence in the first stage. In the EEUCA 2026 Shared Task on Multimodal Vaccine Critical Meme Detection, our system achieved a Macro F1-score of 0.8205, ranking 9th on the official leaderboard. Furthermore, we discuss various exploratory strategies evaluated during the competition and provide a detailed analysis of the model’s performance.

pdf bib abs

thaulab@EEUCA 2026: Who Said What to Whom? A Targeting-Aware Neural-Symbolic Pipeline for Gaming Toxicity Detection
Anmol Guragain | Marcos Estecha-Garitagoitia | Luis Fernando D’Haro | Ricardo de Córdoba

This paper describes our system for the EEUCA 2026 Shared Task on toxicity classification in gaming chat. We implement a three-stage pipeline combining an ensemble of two compact transformers (DeBERTa-v3-base, 184M; XLM-RoBERTa-base, 278M) with a Linguistically-Informed Mediator (LIM) that resolves inter-model disagreements through corpus-backed lexical normalization, class-conditional unigram scoring, multilingual profanity detection, and agentive targeting analysis grounded in speech act theory. The LIM specifically targets the minority classes (Hate Harassment, Threats, and Extremism), which are the most safety-critical categories in real-world gaming moderation. To address the extreme class imbalance (1,450:1 Non-toxic to Extremism ratio), we introduce a two-stage data augmentation strategy using only the provided training data. Our system achieves a Macro F1 of 0.6441 and accuracy of 0.9062 on the official test set, ranking 3rd in Macro F1 and 1st in accuracy among all teams. The proposed pipeline is domain-portable: adapting to other gaming platforms requires substituting only the game-specific entity lexicon. Code is publicly available at https://github.com/Anmol2059/thaulab_EEUCA.

pdf bib abs

syuhhh@EEUCA 2026: A Three-Stage Progressive Training Framework for Fine-Grained Toxicity Detection in Online Gaming Communities
Yuhao Shi | Yu Wang | Shengjie Zhao

This paper presents our 1st-place system for the Shared Task on Fine-Grained Toxicity Detection in Online Gaming (GameTox) at the 9th EEUCA Workshop, co-located with ACL 2026. The task targets 6-class fine-grained toxic intent classification on the official GameTox dataset, comprising 53,000 real-world World of Tanks chat utterances. We propose a three-stage progressive training framework built on XLM-RoBERTa-large: (1) gaming domain adaptive MLM pre-training, (2) multilingual toxicity transfer fine-tuning, and (3) supervised contrastive learning (SCL)-enhanced target task tuning. We further incorporate LLM-driven data augmentation and long-tailed class synthesis. Our system achieves a Macro F1 of 0.7041, ranking 1st among 35 teams. Ablation studies validate each module’s contribution, and we release our code to facilitate follow-up research.

pdf bib abs

CSECU-Learners@EEUCA 2026: Vaccine Critical Memes Identification using Two-Stage Early Fusion of Transformers
Monir Ahmad | Md. Saif Uddin

Memes have emerged as a fast and influential way to share information online, particularly during major public health events like COVID-19 vaccination. While they can support awareness and encourage positive behavior, they are also widely used to spread misinformation and vaccine-critical views. These messages are often expressed through sarcasm and implicit meaning, which makes automatic detection difficult. To tackle this problem, EEUCA 2026 introduces a shared task based on the VaxMeme dataset for multimodal vaccine critical meme detection. The task encourages us to design models that can jointly understand both image and text, capturing the underlying context more effectively. In this work, we present our approach to this task by proposing a two-stage early fusion framework that integrates multiple transformer-based encoders. We train our model using focal loss to give more attention to difficult samples. Our experimental results show that our method performs competitively in the shared task, demonstrating its effectiveness for this problem.

pdf bib abs

ShriNep@EEUCA 2026: RAKSHAK – Multi-Task DeBERTa with Rationale Distillation and Jigsaw-Augmented Training for Toxic Intent Classification
Binayak Karki | Aryan Kafle | Pingala Ghimire

This paper presents two systems for the GameTox Shared Task at the Workshop on EEUCA at ACL 2026, which requires classifying World of Tanks chat utterances into six fine-grained toxic intent categories (Labels 0–5). Severe class imbalance, domain-specific multilingual slang, and extremely scarce data for rare categories such as Threats (Label 4, 60 samples) and Extremism (Label 5, 24 samples) make this a challenging classification problem. Our primary submission, RAKSHAK (rakṣaka, Sanskrit for "Protector"), is a multi-task DeBERTa-v3-base framework combining rationale distillation from Qwen2.5-14B, Supervised Contrastive Loss, and dedicated rare-class binary heads. RAKSHAK’s training data is augmented with cross-domain transfer from the Jigsaw Toxic Comment dataset (16,225 samples mapped to Labels 1–4) and 100 LLM-generated extremism samples for Label 5. Our secondary system (M1) fine-tunes DeBERTa-v3-base with Focal Loss on the original GameTox data plus the same 100 extremism samples, without Jigsaw transfer. RAKSHAK achieves a Macro F1 of 0.5883 on the official test set, ranking 7th out of 35 participating teams, while M1 achieves 0.5252 Macro F1. An ablation comparing M1 with and without Jigsaw data shows that cross-domain transfer accounts for +2.6 F1 points, while RAKSHAK’s multi-task architecture contributes a further +3.7 points.

pdf bib abs

_alexcristea@EEUCA 2026: A Robust Early-Fusion ERNIE Pipeline for Multimodal COVID-19 Vaccine Meme Classification
Cristea Alexandru-Marian | Costin Ionescu

This paper presents our system for the EEUCA0022026 shared task on Multimodal Vaccine Critical Meme Detection. The task focuses on categorizing social media memes from the VaxMeme dataset into three stances: Vaccine Critical, Neutral, and Pro-Vaccine. To tackle the inherent challenges of internet sarcasm, implicit context, and high label noise, we propose a robust, heavily regularized text-fusion pipeline. Rather than relying on computationally heavy visual encoders, we extract text directly from the images via OCR and concatenate it with the user’s social media post, processing the unified context through an ERNIE 2.0-Large encoder. To combat the severe overfitting typical in subjective meme datasets, we replace the standard classification head with a Multi-Sample Dropout architecture, averaging predictions across five parallel dropout masks (p = 0.3). Our optimized, lightweight text-only pipeline achieves a peak Macro F1 score of 0.834. Furthermore, an ablation study utilizing Focal Loss reveals that our primary solution using standard Cross-Entropy provides superior robustness against the inherent label noise found in internet meme datasets.

pdf bib abs

PSK@EEUCA 2026: Fine-tuning Large Language Models with Synthetic Data Augmentation for Multi-class Toxicity Detection in Gaming Chat
Srikar Kashyap Pulipaka

This paper describes our system for the EEUCA 2026 Shared Task on Understanding Toxic Behavior in Gaming Communities. The task involves classifying World of Tanks chat messages into six toxicity categories: Non-toxic, Insults/Flaming, Other Offensive, Hate/Harassment, Threats, and Extremism. We explore multiple approaches including encoder-based models, instruction-tuned LLMs with LoRA fine-tuning, hierarchical classification, one-vs-rest strategies, and various ensemble methods. Our best system combines Llama 3.1 8B with carefully calibrated 5% synthetic data augmentation, achieving an F1-macro score of 0.6234 on the test set, placing 4th out of 35 participating teams. We provide extensive analysis of the dataset’s annotation patterns and their impact on model generalization, revealing a critical “validation trap” phenomenon where high validation performance correlates with poor test transfer.

pdf bib abs

TAGA@EEUCA 2026: Token-Attribution Guided Attention for Fine-Grained Toxic Behaviour Classification in Online Gaming Communities
Akshyat Shah | Shashi Sah | Aryan Gupta | Kavinder Singh

Online gaming involves large amount of people forming a large community of players who interact in real time. Toxic behavior in online chat is common and can harm players by deterring them. Thus, automated moderation is a necessity but difficult because game chat mixes domain-specific slang, deliberate obfuscation, informal "gamer" language , and tiny support for categories such as threats and extremism. This paper describes the TAGA (Token-Attribution Guided Attention) system submitted to the EEUCA 2026 Shared Task on Understanding Toxic Behavior in Gaming Communities. We propose TAGA, an architecture that employs a leave-one-out attribution method using the Detoxify toxicity scorer to compute per-token attribution scores across multiple toxicity dimensions, which are then projected into the learned attention biases that steer the model toward toxicity-indicative tokens. By preparing a five phase ablation study, we demonstrate that each component: domain-specific preprocessing, focal loss with label smoothing, attribution-guided attention pooling, and dual-model Detoxify features with strategic oversampling contributes to a cumulative gain in macro-F1 score points over the DeBERTa-v3-base baseline reported. The final system achieves a test macro-F1 score of 0.618 and, importantly, produces non-zero predictions for extreme data imbalance present in the dataset used in the shared task.

pdf bib abs

LilyMeme@EEUCA 2026: Multimodal Vaccine Meme Stance Detection with Task-Adapted MemeCLIP and Complementary Ensembling
Yixuan Li | Xiaolong Yin | Yang Yang

Memes have emerged as a prominent medium for conveying public sentiment on sensitive health topics such as vaccination. Unlike conventional multimodal tasks, memes feature implicit stances, sarcastic nuances, and complex cross-modal interactions, posing significant challenges for accurate stance detection. This paper presents our approach for the VaxMeme Shared Task @EEUCA 2026, which aims to classify vaccine-related memes into three distinct classes: Vaccine-critical, Neutral, and Pro-vaccine. Building upon MemeCLIP, we systematically enhance our framework via task-specific adaptation, lightweight cross-modal fusion, noise-aware training, LLM-assisted semantic augmentation, and inference-stage optimization, ultimately ensembling multiple complementary variants for final predictions. Our ensemble method achieves a Macro-F1 score of 0.8494 on the official test set, securing first place and demonstrating the critical efficacy of noise-aware training and late-stage ensembling for robust stance identification.

pdf bib abs

LINUS@EEUCA 2026: Fine-grained Toxicity Detection in Gaming Chat using Multilingual Transformers
Prajwal Ghimire | Aashish Mahato | Sunil Regmi

The detection of toxic behavior in online gaming communities is crucial for maintaining safe digital spaces, yet remains challenging due to subtle context-dependent and intent-driven language. The GameTox dataset consists of around 53K World of Tanks chat utterances annotated across six categories: Non-toxic, Insults and Flaming, Other Offensive Texts, Hate and Harassment, Threats, and Extremism (CITATION). Our best performing approach, across multiple transformer-based architecture experimentations, is based on the multilingual BERT variant mmBERT-base fine-tuned with class-weighted cross-entropy loss. The best mmBERT-base model achieved a Macro F1 of 0.5882 during validation and an official test Macro F1 of 0.5104 on the shared task leaderboard. An internal held-out evaluation on a development split yielded 0.4282, which we analyze to understand distributional sensitivity to gaming slang and class imbalance. The code is available at: https://github.com/sunilRegmi-ai/eeuca-toxicity-detection.

pdf bib abs

Linus@EEUCA 2026: Multimodal and Text-Only Approaches to Vaccine-Critical Meme Detection.
Darwin Acharya | Shiv Ram Saud | Sunil Regmi

In this paper, we describe our participation in the Shared Task on Multimodal Identification of Vaccine Critical Content on Social Media (VaxMeme) of EEUCA 2026, a satellite of ACL 2026. We tackle the classification of Twitter-based vaccine memes into anti-vaccine, neutral, and pro-vaccine categories using the VaxMeme dataset with 8,195 train, 1,024 val, and 1,025 test samples. We experiment with two different architecture families: (i) Multimodal hybrids: CLIP ViT-B/32 for images + BERT-based models for texts (BERT-base-uncased, ModernBERT) with late fusion strategy based on concatenation of L2-normalized feature vectors and (ii) Text-only: pre-trained models for texts (BERT-base-uncased, RoBERTa-base, ModernBERT-base, DistilBERT-base, Deberta-v3-base) for post_text. In both cases, we use a three-layer feed-forward network with GELU activation for classification. We use class-weighted cross-entropy loss, differential learning rates, AdamW optimizer, gradient accumulation, OneCycleLR scheduler, and early stopping on the val set for optimization. Data augmentation is applied for the multimodal CLIP-based approach only. The winning approach among those tested is the text-only BERT-base-uncased with a macro-F1 of 0.8102 which is ahead of the performance of the CLIP + BERT-base hybrid model, which achieves a test macro-F1 of 0.7603.