Zsolt T. Kardkovács
Also published as: Zsolt T Kardkovacs
2025
BTC-SAM: Leveraging LLMs for Generation of Bias Test Cases for Sentiment Analysis Models
Zsolt T. Kardkovács
|
Lynda Djennane
|
Anna Field
|
Boualem Benatallah
|
Yacine Gaci
|
Fabio Casati
|
Walid Gaaloul
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Sentiment Analysis (SA) models harbor inherent social biases that can be harmful in real-world applications. These biases are identified by examining the output of SA models for sentences that only vary in the identity groups of the subjects.Constructing natural, linguistically rich, relevant, and diverse sets of sentences that provide sufficient coverage over the domain is expensive, especially when addressing a wide range of biases: it requires domain experts and/or crowd-sourcing. In this paper, we present a novel bias testing framework, BTC-SAM, which generates high-quality test cases for bias testing in SA models with minimal specification using Large Language Models (LLMs) for the controllable generation of test sentences. Our experiments show that relying on LLMs can provide high linguistic variation and diversity in the test sentences, thereby offering better test coverage compared to base prompting methods even for previously unseen biases.
2022
TF-IDF or Transformers for Arabic Dialect Identification? ITFLOWS participation in the NADI 2022 Shared Task
Fouad Shammary
|
Yiyi Chen
|
Zsolt T Kardkovacs
|
Mehwish Alam
|
Haithem Afli
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
This study targets the shared task of Nuanced Arabic Dialect Identification (NADI) organized with the Workshop on Arabic Natural Language Processing (WANLP). It further focuses on Subtask 1 on the identification of the Arabic dialects at the country level. More specifically, it studies the impact of a traditional approach such as TF-IDF and then moves on to study the impact of advanced deep learning based methods. These methods include fully fine-tuning MARBERT as well as adapter based fine-tuning of MARBERT with and without performing data augmentation. The evaluation shows that the traditional approach based on TF-IDF scores the best in terms of accuracy on TEST-A dataset, while, the fine-tuned MARBERT with adapter on augmented data scores the second on Macro F1-score on the TEST-B dataset. This led to the proposed system being ranked second on the shared task on average.
Search
Fix author
Co-authors
- Haithem Afli 1
- Mehwish Alam 1
- Boualem Benatallah 1
- Fabio Casati 1
- Yiyi Chen 1
- show all...