2024
pdf
abs
MERA: A Comprehensive LLM Evaluation in Russian
Alena Fenogenova
|
Artem Chervyakov
|
Nikita Martynov
|
Anastasia Kozlova
|
Maria Tikhonova
|
Albina Akhmetgareeva
|
Anton Emelyanov
|
Denis Shevelev
|
Pavel Lebedev
|
Leonid Sinev
|
Ulyana Isaeva
|
Katerina Kolomeytseva
|
Daniil Moskovskiy
|
Elizaveta Goncharova
|
Nikita Savushkin
|
Polina Mikhailova
|
Anastasia Minaeva
|
Denis Dimitrov
|
Alexander Panchenko
|
Sergey Markov
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). However, despite researchers’ attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, we introduce a new instruction benchmark, MERA, oriented towards the FMs’ performance on the Russian language. The benchmark encompasses 21 evaluation tasks for generative models covering 10 skills and is supplied with private answer scoring to prevent data leakage. The paper introduces a methodology to evaluate FMs and LMs in fixed zero- and few-shot instruction settings that can be extended to other modalities. We propose an evaluation methodology, an open-source code base for the MERA assessment, and a leaderboard with a submission system. We evaluate open LMs as baselines and find they are still far behind the human level. We publicly release MERA to guide forthcoming research, anticipate groundbreaking model features, standardize the evaluation procedure, and address potential ethical concerns and drawbacks.
pdf
abs
A Methodology for Generative Spelling Correction via Natural Spelling Errors Emulation across Multiple Domains and Languages
Nikita Martynov
|
Mark Baushenko
|
Anastasia Kozlova
|
Katerina Kolomeytseva
|
Aleksandr Abramov
|
Alena Fenogenova
Findings of the Association for Computational Linguistics: EACL 2024
Large language models excel in text generation and generalization, however they face challenges in text editing tasks, especially in correcting spelling errors and mistyping.In this paper, we present a methodology for generative spelling correction (SC), tested on English and Russian languages and potentially can be extended to any language with minor changes. Our research mainly focuses on exploring natural spelling errors and mistyping in texts and studying how those errors can be emulated in correct sentences to enrich generative models’ pre-train procedure effectively. We investigate the effects of emulations in various text domains and examine two spelling corruption techniques: 1) first one mimics human behavior when making a mistake through leveraging statistics of errors from a particular dataset, and 2) second adds the most common spelling errors, keyboard miss clicks, and some heuristics within the texts.We conducted experiments employing various corruption strategies, models’ architectures, and sizes in the pre-training and fine-tuning stages and evaluated the models using single-domain and multi-domain test sets. As a practical outcome of our work, we introduce SAGE (Spell checking via Augmentation and Generative distribution Emulation).