Aleksandra Krasnodębska

Also published as: Aleksandra Krasnodebska

2026

Safety of Large Language Models Beyond English: A Systematic Literature Review of Risks, Biases, and Safeguards
Aleksandra Krasnodębska | Katarzyna Dziewulska | Karolina Seweryn | Maciej Chrabaszcz | Wojciech Kusa
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

As Large Language Models (LLMs) continue to evolve, ensuring their safety across multiple languages has become a critical concern. While LLMs demonstrate impressive capabilities in English, their safety mechanisms may not generalize effectively to other languages, leading to disparities in toxicity detection, bias mitigation, and harm prevention. This systematic review examines the multilingual safety of LLMs by synthesizing findings from recent studies that evaluate their robustness across diverse linguistic and cultural contexts beyond English language. Our review explores the methodologies used to assess multilingual safety, identifies challenges such as dataset availability and evaluation biases. Based on our analysis we highlight gaps in multilingual safety research and provide recommendations for future work. This review aims to contribute to the development of fair and effective safety mechanisms for LLMs across all languages. We provide the extracted data in an interactive Streamlit dashboard, enabling transparent access to the raw data and allowing for continuous updates.

pdf bib abs

Annotation-Efficient Vision-Language Model Adaptation to the Polish Language Using the LLaVA Framework
Grzegorz Statkiewicz | Alicja Dobrzeniecka | Karolina Seweryn | Aleksandra Krasnodębska | Karolina Piosek | Katarzyna Bogusz | Sebastian Cygert | Wojciech Kusa
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Most vision-language models (VLMs) are trained on English-centric data, limiting their performance in other languages and cultural contexts. This restricts their usability for non-English-speaking users and hinders the development of multimodal systems that reflect diverse linguistic and cultural realities. In this work, we reproduce and adapt the LLaVA-Next methodology to create a set of Polish VLMs. We rely on a fully automated pipeline for translating and filtering existing multimodal datasets, and complement this with synthetic Polish data for OCR and culturally specific tasks. Despite relying almost entirely on automatic translation and minimal manual intervention, our approach yields strong results: we observe a +9.5 pp improvement over LLaVA-1.6-Vicuna-13B on a Polish-adapted MMBench, along with higher-quality captions in generative evaluations, as measured by human annotators in terms of linguistic correctness. These findings highlight that large-scale automated translation, combined with lightweight filtering, can effectively bootstrap high-quality multimodal models for low-resource languages. Some challenges remain, particularly in cultural coverage and evaluation. To facilitate further research, we release our models, code, and datasets.

2025

pdf bib abs

PL-Guard: Benchmarking Language Model Safety for Polish
Aleksandra Krasnodebska | Karolina Seweryn | Szymon Łukasik | Wojciech Kusa
Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)

We present a benchmark dataset for evaluating language model safety in Polish, addressing the underrepresentation of medium-resource languages in existing safety assessments. Our dataset includes both original and adversarially perturbed examples. We fine-tune and evaluate multiple models—LlamaGuard-3-8B, a HerBERT-based classifier, and PLLuM—and find that the HerBERT-based model outperforms others, especially under adversarial conditions.

Alignment is the critical process of minimizing harmful outputs by teaching large language models (LLMs) to prefer safe, helpful and appropriate responses. While the majority of alignment research and datasets remain overwhelmingly English-centric, ensuring safety across diverse linguistic and cultural contexts requires localized resources. In this paper, we introduce the first Polish preference dataset PLLuM-Align, created entirely through human annotation to reflect Polish language and cultural nuances. The dataset includes response rating, ranking, and multi-turn dialog data. Designed to reflect the linguistic subtleties and cultural norms of Polish, this resource lays the groundwork for more aligned Polish LLMs and contributes to the broader goal of multilingual alignment in underrepresented languages.

pdf bib abs

Rainbow-Teaming for the Polish Language: A Reproducibility Study
Aleksandra Krasnodębska | Maciej Chrabaszcz | Wojciech Kusa
Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)

The development of multilingual large language models (LLMs) presents challenges in evaluating their safety across all supported languages. Enhancing safety in one language (e.g., English) may inadvertently introduce vulnerabilities in others. To address this issue, we implement a methodology for the automatic creation of red-teaming datasets for safety evaluation in Polish language. Our approach generates both harmful and non-harmful prompts by sampling different risk categories and attack styles. We test several open-source models, including those trained on Polish data, and evaluate them using metrics such as Attack Success Rate (ASR) and False Reject Rate (FRR). The results reveal clear gaps in safety performance between models and show that better testing across languages is needed.

Venues

Fix author