Dmitrii Kosenko


2026

This paper presents SAGE, an open-access framework that encloses a set of models specifically designed for the generative correction of spelling, punctuation, and capitalization errors in Russian. The release includes four models, featuring a Russian-English version and a distilled version for easy use and cost-effectiveness. The models are pre-trained using a sequence-to-sequence approach on artificial errors that mimic human mistakes and fine-tuned on annotated multi-domain texts. A set of carefully engineered auxiliary learning objectives is employed during pre-training to enrich the models with additional semantic and syntactic information. Evaluations indicate that SAGE models, despite having a small number of parameters, outperform top-tier multilingual and Russian-specific large language models, including both closed- and open-source options, and are considered state-of-the-art. We release the online demo powered by a single Nvidia A100 80GB GPU as a Web service, which allows to simultaneously test the most advanced SAGE model of 1.7B parameters, its distilled version and the Russian-English SAGE model.

2024

This paper presents the solution of the DeepPavlov team for the Multimodal Sentiment Cause Analysis competition in SemEval-2024 Task 3, Subtask 2 (Wang et al., 2024). In the evaluation leaderboard, our approach ranks 7th with an F1-score of 0.2132. Large Language Models (LLMs) are transformative in their ability to comprehend and generate human-like text. With recent advancements, Multimodal Large Language Models (MLLMs) have expanded LLM capabilities, integrating different modalities such as audio, vision, and language. Our work delves into the state-of-the-art MLLM Video-LLaMA, its associated modalities, and its application to the emotion reasoning downstream task, Multimodal Emotion Cause Analysis in Conversations (MECAC). We investigate the model’s performance in several modes: zero-shot, few-shot, individual embeddings, and fine-tuned, providing insights into their limits and potential enhancements for emotion understanding.