2025
pdf
bib
abs
The Enemy from Within: A Study of Political Delegitimization Discourse in Israeli Political Speech
Naama Rivlin-Angert
|
Guy Mor-Lan
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
We present the first large-scale computational study of political delegitimization discourse (PDD), defined as symbolic attacks on the normative validity of political entities. We curate and manually annotate a novel Hebrew-language corpus of 10,410 sentences drawn from parliamentary speeches (1993-2023), Facebook posts, and leading news outlets (2018-2021), of which 1,812 instances (17.4%) exhibit PDD and 642 carry additional annotations for intensity, incivility, target type, and affective framing. We introduce a two-stage classification pipeline, and benchmark finetuned encoder models and decoder LLMs. Our best model (DictaLM 2.0) attains an F1 of 0.74 for binary PDD detection and a macro-F1 of 0.67 for classification of delegitimization characteristics. Applying this classifier to longitudinal and cross-platform data, we see a marked rise in PDD over three decades, higher prevalence on social media versus parliamentary debate, greater use by male politicians than by their female counterparts, and stronger tendencies among right-leaning actors, with pronounced spikes during election campaigns and major political events. Our findings demonstrate the feasibility and value of automated PDD analysis for analyzing democratic discourse.
pdf
bib
abs
HebID: Detecting Social Identities in Hebrew-language Political Text
Guy Mor-Lan
|
Naama Rivlin-Angert
|
Yael R. Kaplan
|
Tamir Sheafer
|
Shaul R. Shenhav
Findings of the Association for Computational Linguistics: EMNLP 2025
Political language is deeply intertwined with social identities. While social identities are often shaped by specific cultural contexts, existing NLP datasets are predominantly English-centric and focus on coarse-grained identity categories. We introduce HebID, the first multilabel Hebrew corpus for social identity detection. The corpus contains 5,536 sentences from Israeli politicians’ Facebook posts (Dec 2018-Apr 2021), with each sentence manually annotated for twelve nuanced social identities (e.g., Rightist, Ultra-Orthodox, Socially-oriented) selected based on their salience in national survey data. We benchmark multilabel and single-label encoders alongside 2B-9B-parameter decoder LLMs, finding that Hebrew-tuned LLMs provide the best results (macro-F1 = 0.74). We apply our classifier to politicians’ Facebook posts and parliamentary speeches, evaluating differences in popularity, temporal trends, clustering patterns, and gender-related variations in identity expression. We utilize identity choices from a national public survey, comparing the identities portrayed in elite discourse with those prioritized by the public. HebID provides a comprehensive foundation for studying social identities in Hebrew and can serve as a model for similar research in other non-English political contexts
2024
pdf
bib
abs
IsraParlTweet: The Israeli Parliamentary and Twitter Resource
Guy Mor-Lan
|
Effi Levi
|
Tamir Sheafer
|
Shaul R. Shenhav
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
We introduce IsraParlTweet, a new linked corpus of Hebrew-language parliamentary discussions from the Knesset (Israeli Parliament) between the years 1992-2023 and Twitter posts made by Members of the Knesset between the years 2008-2023, containing a total of 294.5 million Hebrew tokens. In addition to raw text, the corpus contains comprehensive metadata on speakers and Knesset sessions as well as several linguistic annotations. As a result, IsraParlTweet can be used to conduct a wide variety of quantitative and qualitative analyses and provide valuable insights into political discourse in Israel.
pdf
bib
abs
Exploring Factual Entailment with NLI: A News Media Study
Guy Mor-Lan
|
Effi Levi
Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024)
We explore the relationship between factuality and Natural Language Inference (NLI) by introducing FactRel – a novel annotation scheme that models factual rather than textual entailment, and use it to annotate a dataset of naturally occurring sentences from news articles. Our analysis shows that 84% of factually supporting pairs and 63% of factually undermining pairs do not amount to NLI entailment or contradiction, respectively, suggesting that factual relationships are more apt for analyzing media discourse. We experiment with models for pairwise classification on the new dataset, and find that in some cases, generating synthetic data with GPT-4 on the basis of the annotated dataset can improve performance. Surprisingly, few-shot learning with GPT-4 yields strong results on par with medium LMs (DeBERTa) trained on the labelled dataset. We hypothesize that these results indicate the fundamental dependence of this task on both world knowledge and advanced reasoning abilities.
2022
pdf
bib
abs
Detecting Narrative Elements in Informational Text
Effi Levi
|
Guy Mor
|
Tamir Sheafer
|
Shaul R. Shenhav
Findings of the Association for Computational Linguistics: NAACL 2022
Automatic extraction of narrative elements from text, combining narrative theories with computational models, has been receiving increasing attention over the last few years. Previous works have utilized the oral narrative theory by Labov and Waletzky to identify various narrative elements in personal stories texts. Instead, we direct our focus to informational texts, specifically news stories. We introduce NEAT (Narrative Elements AnnoTation) – a novel NLP task for detecting narrative elements in raw text. For this purpose, we designed a new multi-label narrative annotation scheme, better suited for informational text (e.g. news media), by adapting elements from the narrative theory of Labov and Waletzky (Complication and Resolution) and adding a new narrative element of our own (Success). We then used this scheme to annotate a new dataset of 2,209 sentences, compiled from 46 news articles from various category domains. We trained a number of supervised models in several different setups over the annotated dataset to identify the different narrative elements, achieving an average F1 score of up to 0.77. The results demonstrate the holistic nature of our annotation scheme as well as its robustness to domain category.