Parush Gera

2026

Diagnosing Generalization in Open-Source LLMs for Stance Detection
Parush Gera | Tempestt Neal
Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026)

Stance detection identifies whether a text expresses support, opposition, or neutrality toward a target and is central to applications such as political analysis and misinformation monitoring. With the shift toward large language models (LLMs), stance classification increasingly relies on prompting and lightweight adaptation. Yet the generalization behavior of open-source LLMs across new targets and domains remains uneven. We conduct a large-scale diagnostic study of four open-source LLMs (3B–24B parameters), examining how model size, prompting strategies, and Low-Rank Adaptation (LoRA) interact across in-target, cross-target, and cross-domain settings. Across 912 experiments, three patterns emerge: (1) larger models improve prompting-based in-target performance, but this advantage diminishes after fine-tuning; (2) LoRA boosts in-target accuracy yet often harms cross-context transfer; (3) optimal prompting depends on model size. These results reveal a consistent tension between specialization and generalization, offering practical guidance for configuring LLM-based stance detection under transfer.

pdf bib abs

Understanding the Linguistic Cues Behind Stance Detection
Parush Gera | Tempestt Neal
Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026)

Stance detection seeks to determine whether a text expresses a position in favor of, against, or neutral toward a target. Despite advances in neural architectures, performance remains inconsistent across datasets. To better understand these disparities, we analyze over 75K samples from four benchmark datasets using six neural models, focusing on stylistic and pragmatic language features rather than architectures or external knowledge. We extract 43 features spanning lexical richness, syntactic complexity, affective tone, and hedging, and assess their impact through both Logistic Regression and SHAP analyses. Our findings reveal distinct stylistic profiles for each stance: favor is best detected when expressed concisely with minimal hedging; against when paired with strong negative emotions and greater lexical variety; and none when texts are lexically simple and emotionally neutral. Across classes, errors arise from excessive complexity, mixed emotional signals, and overuse of hedging. These results advance understanding of what drives success and failure in stance detection.

Parush Gera

2026

2022

Co-authors

Venues