Pushkar Arora

2026

CoPol at SemEval-2026 Task 9: Modeling Polarization Type Co-occurrence with Label Correlation Networks
Pushkar Arora
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

POLAR-LDA is a label-dependency–aware system for SemEval-2026 Task 9 (multi-label polarization type classification) that augments an mDeBERTa-v3-base encoder with a Label Correlation Network (language-specific directed co-occurrence matrices + GAT), Asymmetric Loss tuned for extreme positive scarcity, and a language-grouped ensemble. The system scores 0.567 macro F1 across 22 languages (range 0.784 Hindi — 0.256 Italian) and shows clear ablation gains (ASL +0.041, LCN +0.030, ensemble +0.018). Key findings: absolute data voids (0–1 positive examples) form an unrecoverable floor for supervised learning; label co-occurrence is culturally situated (e.g., political↔religious in Indic vs. political↔racial in some Western languages) and benefits from language-specific graphs; and per-label training volume predicts cross-lingual performance better than linguistic family. Limitations are honest and important: noisy AL estimates under scarcity, an incoherent residual "other" category, and domain mismatch between pretraining corpora and polarization discourse. Overall, the paper offers a strong shared-task system and useful empirical diagnostics—practical and well-executed, but incrementally novel methodologicall

pdf bib abs

Generative AI—powered by Large Language Models (LLMs)—is increasingly deployed in industry across healthcare decision support, financial analytics, enterprise retrieval, and conversational automation, where reliability, efficiency, and cost control are critical. In such settings, models must satisfy strict constraints on energy, latency, and hardware utilization—not accuracy alone. Yet prevailing evaluation pipelines remain accuracy-centric, creating a Deployment–Evaluation Gap—the absence of operational and economic criteria in model assessment. To address this gap, we present EDGE-EVAL—a industry-oriented benchmarking framework that evaluates LLMs across their full lifecycle on legacy NVIDIA Tesla T4 GPUs. Benchmarking LLaMA and Qwen variants across three industrial tasks, we introduce five deployment metrics—Economic Break-Even (Nbreak), Intelligence-Per-Watt (IP W ), System Density (ρsys), Cold-Start Tax (Ctax), and Quantization Fidelity (Qret)—capturing profitability, energy efficiency, hardware scaling, serverless feasibility, and compression safety. Our results reveal a clear efficiency frontier—models in the < 2B parameter class dominate larger baselines across economic and ecological dimensions. LLaMA-3.2-1B (INT4) achieves ROI break-even in 14 requests (median), delivers 3× higher energy-normalized intelligence than 7B models, and exceeds 6,900 tokens/s/GB under 4-bit quantization. We further uncover an efficiency anomaly—while QLoRA reduces memory footprint, it increases adaptation energy by up to 7× for small models—challenging prevailing assumptions about quantization-aware training in edge deployment.

Co-authors

Sushant Kumar Ray 1

Ebad Shabbir 1

Venues

Fix author