Fernanda Bufon Färber

Also published as: Fernanda Bufon Farber


2026

The development of robust hate speech detection systems remains limited by the lack of large-scale, fine-grained training data, especially for languages beyond English. Existing corpora typically rely on simplistic toxic and non-toxic labels, and the few that capture hate directed at specific minority groups lack the positive counterexamples required to distinguish genuine hate from mere discussion. In this work, we introduce ToxSyn-PT, the first Portuguese large-scale corpus explicitly designed for multi-label hate speech detection across nine protected minority groups, including the non-toxic counterexamples absent in all other public datasets. Generated via a controllable four-stage pipeline, ToxSyn contains discourse-type annotations to capture rhetorical strategies of toxic/non-toxic language, such as sarcasm, dehumanization, and cultural appreciation. Our experiments reveal a catastrophic, mutual generalization failure compared to existing datasets from social-media domains: models trained on social media struggle to generalize to minority-specific contexts, and vice-versa. This finding indicates they are distinct tasks and exposes summary metrics like Macro F1 can be unreliable indicators of true model behavior, as they completely mask model failure. We publicly release ToxSyn on HuggingFace to support reproducible research on synthetic data generation and benchmark progress in hate-speech detection for low- and mid-resource languages.
While large language models (LLMs) show transformative potential in healthcare, their development remains focused on high-resource languages. This creates a critical barrier for other languages, as simple translation fails to capture unique clinical and cultural nuances, such as endemic diseases. To address this, we introduce MedPT, the first large-scale, real-world corpus of patient-doctor interactions for the Brazilian Portuguese medical domain. Comprising 384,095 authentic question-answer pairs and covering over 3,200 distinct health-related conditions, the dataset was refined through a rigorous multi-stage curation protocol that employed a hybrid quantitative-qualitative analysis to filter noise and contextually enrich thousands of ambiguous queries, resulting in a corpus of approximately 57 million tokens. We further utilize of LLM-driven annotation to classify queries into seven semantic types to capture user intent. To validate MedPT’s utility, we benchmark it in a medical specialty classification task: fine-tuning a 1.7B parameter model achieves an outstanding 94% F1-score on a 20-class setup. Furthermore, our qualitative error analysis shows misclassifications are not random but reflect genuine clinical ambiguities (e.g., between comorbid conditions), proving the dataset’s deep semantic richness. We publicly release MedPT on Hugging Face to support the development of more equitable, accurate, and culturally-aware medical technologies for the Portuguese-speaking world.

2025

Prompt injection and jailbreak attacks remain a critical vulnerability for deployed large language models (LLMs), allowing adversaries to bypass safety protocols and extract sensitive information. To address this, we present Proxy Barrier (ProB), a lightweight defense that interposes a proxy LLM between the user and the target model. The proxy LLM is tasked solely to repeat the user input, and any failure indicates the presence of an attempt to reveal or override system instructions, leading the malicious request to be detected and blocked before it reaches the target model. ProB therefore requires no access to model weights or prompts, and is deployable entirely at the API level. Experiments across multiple model families demonstrate that ProB achieves state-of-the-art resilience against prompt leakage and jailbreak attacks. Notably, our approach outperforms baselines and achieves up to 98.8% defense effectiveness, and shows robust protection across both open and closed-source LLMs when suitably paired with proxy models, while also keeping response quality intact.
As large language models (LLMs) become integral to social and interactive applications, the ability to model, control, and evaluate their personality traits has become a critical area of research. This survey provides a comprehensive and structured overview of the LLM-driven personality scenario. We introduce a functional taxonomy that organizes the field by how personality is modeled (from rule-based methods to model-centric and system-level LLM techniques), across which modalities it is expressed (extending beyond text to vision, speech, and immersive virtual reality), and how it is validated (covering both qualitative and quantitative evaluation paradigms). By contextualizing current advances and systematically analyzing the limitations of existing methods including subjectivity, context dependence, limited multimodal integration, and the lack of standardized evaluation protocols, we identify key research gaps. This survey serves as a guide for future inquiry, paving the way for the development LLMs with more consistent consistent, expressive, and trustworthy personality traits.