Jordan Konstantinov Kralev
2025
IfGPT: A Dataset in Bulgarian for Large Language Models
Svetla Peneva Koeva
|
Ivelina Stoyanova
|
Jordan Konstantinov Kralev
Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages
The paper presents the large dataset IfGPT, which contains available corpora and datasets for Bulgarian, and describes methods to continuously expand it with unduplicated and unbiased Bulgarian data. The samples in the dataset are annotated with metadata that enable effective extraction of domain- and application-oriented datasets for fine-tuning or Retrieval Augmented Generation (RAG) of large language models (LLMs). The paper focuses on the description of the extended metadata of the IfGPT dataset and its management in a graph database.
Fusion of Object-Centric and Linguistic Features for Domain-Adapted Multimodal Learning
Jordan Konstantinov Kralev
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Modern multimodal systems often struggle to link domain-specific visual content with textual descriptions, especially when object recognition is limited to general categories (e.g. COCO classes) and lacks customised adaptation to language models. In this paper, we present a novel framework that integrates a domain-specific adapted Detectron2 model into predefined models via a trainable projection layer, enabling precise crossmodal adaptation for specialised domains. Our approach extends Detectron2’s recognition capabilities to new categories by fine-tuning on multi-domain datasets, while a lightweight linear projection layer maps region-based visual features to the model’s embedding space without completely retraining the model. We evaluated the framework for domain-specific image captioning. The presented approach provides a scalable design for combining domain-specific visual recognition with language inference, with applications in domains that require fine-grained multimodal understanding.