Ilya Koziev
2025
Generation of Russian Poetry of Different Genres and Styles Using Neural Networks with Character-Level Tokenization
Ilya Koziev
|
Alena Fenogenova
Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025)
Automatic poetry generation is an immensely complex task, even for the most advanced Large Language Models (LLMs) that requires a profound understanding of intelligence, world and linguistic knowledge, and a touch of creativity.This paper investigates the use of LLMs in generating Russian syllabo-tonic poetry of various genres and styles. The study explores a character-level tokenization architectures and demonstrates how a language model can be pretrained and finetuned to generate poetry requiring knowledge of a language’s phonetics. Additionally, the paper assesses the quality of the generated poetry and the effectiveness of the approach in producing different genres and styles. The study’s main contribution is the introduction of two end-to-end architectures for syllabo-tonic Russian poetry: pretrained models, a comparative analysis of the approaches, and poetry evaluation metrics.