Daniil Vodolazsky
2025
2Columns1Row: A Russian Benchmark for Textual and Multimodal Table Understanding and Reasoning
Vildan Saburov
|
Daniil Vodolazsky
|
Danil Sazanakov
|
Alena Fenogenova
Findings of the Association for Computational Linguistics: EMNLP 2025
Table understanding is a crucial task in document processing and is commonly encountered in practical applications. We introduce 2Columns1Row, the first open-source benchmark for the table question answering task in Russian. This benchmark evaluates the ability of models to reason about the relationships between rows and columns in tables, employing both textual and multimodal inputs. 2Columns1Row consists of six datasets, 28,800 tables, that vary in the complexity of the text within the table contents and the consistency of the values in the cells. We evaluate the models using text-only and multimodal approaches and analyze their performance. Through extensive evaluation, we demonstrate the limitations of current multimodal models on this task and prove the feasibility of a dynamic text-based system utilizing our benchmark. Our results highlight significant opportunities for advancing table understanding and reasoning, providing a solid foundation for future research in this domain.
2022
Constructing a Lexical Resource of Russian Derivational Morphology
Lukáš Kyjánek
|
Olga Lyashevskaya
|
Anna Nedoluzhko
|
Daniil Vodolazsky
|
Zdeněk Žabokrtský
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Words of any language are to some extent related thought the ways they are formed. For instance, the verb ‘exempl-ify’ and the noun ‘example-s’ are both based on the word ‘example’, but the verb is derived from it, while the noun is inflected. In Natural Language Processing of Russian, the inflection is satisfactorily processed; however, there are only a few machine-trackable resources that capture derivations even though Russian has both of these morphological processes very rich. Therefore, we devote this paper to improving one of the methods of constructing such resources and to the application of the method to a Russian lexicon, which results in the creation of the largest lexical resource of Russian derivational relations. The resulting database dubbed DeriNet.RU includes more than 300 thousand lexemes connected with more than 164 thousand binary derivational relations. To create such data, we combined the existing machine-learning methods that we improved to manage this goal. The whole approach is evaluated on our newly created data set of manual, parallel annotation. The resulting DeriNet.RU is freely available under an open license agreement.
2020
DerivBase.Ru: a Derivational Morphology Resource for Russian
Daniil Vodolazsky
Proceedings of the Twelfth Language Resources and Evaluation Conference
Russian morphology has been studied for decades, but there is still no large high coverage resource that contains the derivational families (groups of words that share the same root) of Russian words. The number of words used in different areas of the language grows rapidly, thus the human-made dictionaries published long time ago cannot cover the neologisms and the domain-specific lexicons. To fill such resource gap, we have developed a rule-based framework for deriving words and we applied it to build a derivational morphology resource named DerivBase.Ru, which we introduce in this paper.
Search
Fix author
Co-authors
- Alena Fenogenova 1
- Lukáš Kyjánek 1
- Olga Lyashevskaya 1
- Anna Nedoluzhko 1
- Vildan Saburov 1
- show all...