Hristina Kukova
2026
Bulgarian Massive Multitask Language Understanding Benchmark
Svetla Peneva Koeva | Ivelina Stoyanova | Dimiter Georgiev | Svetlozara Leseva | Valentina Stefanova | Maria Todorova | Tsvetana Ivanova Dimitrova | Hristina Kukova | Mihaela Moskova | Tinko Tinchev
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Svetla Peneva Koeva | Ivelina Stoyanova | Dimiter Georgiev | Svetlozara Leseva | Valentina Stefanova | Maria Todorova | Tsvetana Ivanova Dimitrova | Hristina Kukova | Mihaela Moskova | Tinko Tinchev
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Assessing the broad general knowledge of Large Language Models (LLMs) across multiple domains in Bulgarian remains challenging due to the limited availability of Bulgarian evaluation benchmarks. To address this gap, we introduce the Bulgarian Massive Multitask Language Understanding benchmark (MMLU-BG), designed to evaluate whether LLMs possess generalised knowledge capabilities beyond simple text prediction in Bulgarian. This paper presents the structure, the development protocol, and the size of the MMLU-BG benchmark. It is tested in comparison with the original MMLU for English across seven LLMs selected according to specific criteria. The experiments demonstrate that the MMLU-BG benchmark assesses multi-domain versatility and highlights the models’ strengths and weaknesses across different subject areas.
2024
Multilingual Corpus of Illustrative Examples on Activity Predicates
Ivelina Stoyanova | Hristina Kukova | Maria Todorova | Tsvetana Dimitrova
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)
Ivelina Stoyanova | Hristina Kukova | Maria Todorova | Tsvetana Dimitrova
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)
The paper presents the ongoing process of compilation of a multilingual corpus of illustrative examples to supplement our work on the syntactic and semantic analysis of predicates representing activities in Bulgarian and other languages. The corpus aims to include over 1,000 illustrative examples on verbs from six semantic classes of predicates (verbs of motion, contact, consumption, creation, competition and bodily functions) which provide a basis for observations on the specificity of their realisation. The corpus of illustrative examples will be used for contrastive studies and further elaboration on the scope and behaviour of activity verbs in general, as well as its semantic subclasses.
Assessing Reading Literacy of Bulgarian Pupils with Finger–tracking
Alessandro Lento | Andrea Nadalini | Marcello Ferro | Claudia Marzi | Vito Pirrelli | Tsvetana Dimitrova | Hristina Kukova | Valentina Stefanova | Maria Todorova | Svetla Koeva
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)
Alessandro Lento | Andrea Nadalini | Marcello Ferro | Claudia Marzi | Vito Pirrelli | Tsvetana Dimitrova | Hristina Kukova | Valentina Stefanova | Maria Todorova | Svetla Koeva
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)
The paper reports on the first steps in developing a time-stamped multimodal dataset of reading data by Bulgarian children. Data are being collected, structured and analysed by means of ReadLet, an innovative infrastructure for multimodal language data collection that uses a tablet as a reader’s front-end. The overall goal of the project is to quantitatively analyse the reading skills of a sample of early Bulgarian readers collected over a two-year period, and compare them with the reading data of early readers of Italian, collected using the same protocol. We illustrate design issues of the experimental protocol, as well as the data acquisition process and the post-processing phase of data annotation/augmentation. To evaluate the potential and usefulness of the Bulgarian dataset for reading research, we present some preliminary statistical analyses of our recently collected data. They show robust convergence trends between Bulgarian and Italian early reading development stages.
2012
Application of Clause Alignment for Statistical Machine Translation
Svetla Koeva | Svetlozara Leseva | Ivelina Stoyanova | Rositsa Dekova | Angel Genov | Borislav Rizov | Tsvetana Dimitrova | Ekaterina Tarpomanova | Hristina Kukova
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Svetla Koeva | Svetlozara Leseva | Ivelina Stoyanova | Rositsa Dekova | Angel Genov | Borislav Rizov | Tsvetana Dimitrova | Ekaterina Tarpomanova | Hristina Kukova
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation