Egor Kratkov

2025

Regulatory Natural Language Processing (RegNLP) is a multidisciplinary domain focused on facilitating access to and comprehension of regulatory regulations and requirements. This paper outlines our strategy for creating a system to address the Regulatory Information Retrieval and Answer Generation (RIRAG) challenge, which was conducted during the RegNLP 2025 Workshop. The objective of this competition is to design a system capable of efficiently extracting pertinent passages from regulatory texts (ObliQA) and subsequently generating accurate, cohesive responses to inquiries related to compliance and obligations. Our proposed method employs a lightweight BM25 pre-filtering in retrieving relevant passages. This technique efficiently shortlisting candidates for subsequent processing with Transformer-based embeddings, thereby optimizing the use of resources.

pdf bib abs
TabaQA at SemEval-2025 Task 8: Column Augmented Generation for Question Answering over Tabular Data
Ekaterina Antropova | Egor Kratkov | Roman Derunets | Margarita Trofimova | Ivan Bondarenko | Alexander Panchenko | Vasily Konovalov | Maksim Savkin
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

The DataBench shared task in the SemEval-2025 competition aims to tackle the problem of QA from data in tables. Given the diversity of the structure of tables, there are different approaches to retrieving the answer. Although Retrieval-Augmented Generation (RAG) is a viable solution, extracting relevant information from tables remains challenging. In addition, the table can be prohibitively large for direct integration into the LLM context. In this paper, we address QA over tabular data first by identifying relevant columns that might contain the answers, then the LLM generates answers by providing the context of the relevant columns, and finally, the LLM refines its answers. This approach secured us 7th place in the DataBench lite category.

Co-authors

Margarita Trofimova 1

Venues

Fix author