Nikolai Efimov


2026

Recognizing disinformation is a challenging task for humans and AI systems. News can be false, misleading, or harmful, and its interpretation often depends on the cultural context of the audience. However, existing datasets rarely account for these contextual and cultural differences, as they are typically not designed from the perspective of news consumers. To address this gap, in this paper, we present the Information Disorder (InDor) corpus, a multilingual dataset of news articles in English, Farsi, Italian, and Russian, annotated for information disorder detection and explanation. The corpus was developed through a participatory process involving contributors from diverse cultural and professional backgrounds, who engaged in data collection, annotation, and evaluation of Large Language Model (LLM) performance on the task. Our findings highlight that false and manipulated news manifest differently across cultural settings, and that current LLMs fail to adequately capture this complexity. This underscores the need for culturally aware computational approaches in the study of information disorder.
We present the first systematic study of core NLP tasks for Sakha (Yakut), a low-resource Turkic language with approximately 450,000 speakers in northeastern Siberia. We introduce two manually annotated datasets: a 690-sentence NER corpus (921 entities: PER, LOC, ORG) and an 798-sentence sentiment corpus (positive, negative, neutral). Using mBERT and RuBERT in controlled 2×2 experiments, we report a twofold effect: on the one hand, it improves performance when base unknown-token rates exceed approximately 10% (RuBERT: +9.4 F1); on the other hand, it leads to worse performance otherwise (mBERT: −6.1 F1), despite improving tokenization in both cases. Cross-domain transfer (news vs forums) reveals severe asymmetry: formal-to-informal training achieves 47% accuracy while the reverse yields only 26%—a 21-point gap demonstrating that domain composition dominates model architecture choice in low-resource settings. Neutral-boundary detection is the primary bottleneck, with 89% of disagreements clustering around subjective/objective distinctions rather than polarity confusions. With fewer than 1,000 samples per task, we establish first benchmarks for Sakha NER (53.5 F1) and sentiment analysis (54% accuracy).