Jaimeen Ahn


2022

pdf
Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT
Jaimeen Ahn | Hwaran Lee | Jinhwa Kim | Alice Oh
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Knowledge distillation is widely used to transfer the language understanding of a large model to a smaller model.However, after knowledge distillation, it was found that the smaller model is more biased by gender compared to the source large model.This paper studies what causes gender bias to increase after the knowledge distillation process.Moreover, we suggest applying a variant of the mixup on knowledge distillation, which is used to increase generalizability during the distillation process, not for augmentation.By doing so, we can significantly reduce the gender bias amplification after knowledge distillation.We also conduct an experiment on the GLUE benchmark to demonstrate that even if the mixup is applied, it does not have a significant adverse effect on the model’s performance.

2021

pdf
Mitigating Language-Dependent Ethnic Bias in BERT
Jaimeen Ahn | Alice Oh
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In this paper, we study ethnic bias and how it varies across languages by analyzing and mitigating ethnic bias in monolingual BERT for English, German, Spanish, Korean, Turkish, and Chinese. To observe and quantify ethnic bias, we develop a novel metric called Categorical Bias score. Then we propose two methods for mitigation; first using a multilingual model, and second using contextual word alignment of two monolingual models. We compare our proposed methods with monolingual BERT and show that these methods effectively alleviate the ethnic bias. Which of the two methods works better depends on the amount of NLP resources available for that language. We additionally experiment with Arabic and Greek to verify that our proposed methods work for a wider variety of languages.

2020

pdf
Suicidal Risk Detection for Military Personnel
Sungjoon Park | Kiwoong Park | Jaimeen Ahn | Alice Oh
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We analyze social media for detecting the suicidal risk of military personnel, which is especially crucial for countries with compulsory military service such as the Republic of Korea. From a widely-used Korean social Q&A site, we collect posts containing military-relevant content written by active-duty military personnel. We then annotate the posts with two groups of experts: military experts and mental health experts. Our dataset includes 2,791 posts with 13,955 corresponding expert annotations of suicidal risk levels, and this dataset is available to researchers who consent to research ethics agreement. Using various fine-tuned state-of-the-art language models, we predict the level of suicide risk, reaching .88 F1 score for classifying the risks.