Igor Churin


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Long Context Benchmark for the Russian Language
Igor Churin | Murat Apishev | Maria Tikhonova | Denis Shevelev | Aydar Bulatov | Yuri Kuratov | Sergei Averkiev | Alena Fenogenova
Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)

Recent progress in Natural Language Processing (NLP) has driven the creation of Large Language Models (LLMs) capable of tackling a vast range of tasks. A critical property of these models is their ability to handle large documents and process long token sequences, which has fostered the need for a robust evaluation methodology for long-text scenarios. To meet this requirement in the context of the Russian language, we present our benchmark consisting of 18 datasets designed to assess LLM performance in tasks such as information retrieval, knowledge extraction, machine reading, question answering, and reasoning. These datasets are categorized into four levels of complexity, enabling model evaluation across context lengths up to 128k tokens. To facilitate further research, we provide open-source datasets, a codebase, and a public leaderboard associated with the benchmark.