Elisei Rykov


2024

pdf
SmurfCat at SemEval-2024 Task 6: Leveraging Synthetic Data for Hallucination Detection
Elisei Rykov | Yana Shishkina | Ksenia Petrushina | Ksenia Titova | Sergey Petrakov | Alexander Panchenko
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In this paper, we present our novel systems developed for the SemEval-2024 hallucination detection task. Our investigation spans a range of strategies to compare model predictions with reference standards, encompassing diverse baselines, the refinement of pre-trained encoders through supervised learning, and an ensemble approaches utilizing several high-performing models. Through these explorations, we introduce three distinct methods that exhibit strong performance metrics. To amplify our training data, we generate additional training samples from unlabelled training subset. Furthermore, we provide a detailed comparative analysis of our approaches. Notably, our premier method achieved a commendable 9th place in the competition’s model-agnostic track and 20th place in model-aware track, highlighting its effectiveness and potential.

2022

pdf
RuDSI: Graph-based Word Sense Induction Dataset for Russian
Anna Aksenova | Ekaterina Gavrishina | Elisei Rykov | Andrey Kutuzov
Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing

We present RuDSI, a new benchmark for word sense induction (WSI) in Russian. The dataset was created using manual annotation and semi-automatic clustering of Word Usage Graphs (WUGs). RuDSI is completely data-driven (based on texts from Russian National Corpus), with no external word senses imposed on annotators. We present and analyze RuDSI, describe our annotation workflow, show how graph clustering parameters affect the dataset, report the performance that several baseline WSI methods obtain on RuDSI and discuss possibilities for improving these scores.