2025
pdf
bib
abs
From Measurement to Mitigation: Exploring the Transferability of Debiasing Approaches to Gender Bias in Maltese Language Models
Melanie Galea
|
Claudia Borg
Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
The advancement of Large Language Models (LLMs) has transformed Natural Language Processing (NLP), enabling performance across diverse tasks with little task-specific training. However, LLMs remain susceptible to social biases, particularly reflecting harmful stereotypes from training data, which can disproportionately affect marginalised communities.We measure gender bias in Maltese LMs, arguing that such bias is harmful as it reinforces societal stereotypes and fails to account for gender diversity, which is especially problematic in gendered, low-resource languages.While bias evaluation and mitigation efforts have progressed for English-centric models, research on low-resourced and morphologically rich languages remains limited. This research investigates the transferability of debiasing methods to Maltese language models, focusing on BERTu and mBERTu, BERT-based monolingual and multilingual models respectively. Bias measurement and mitigation techniques from English are adapted to Maltese, using benchmarks such as CrowS-Pairs and SEAT, alongside debiasing methods Counterfactual Data Augmentation, Dropout Regularization, Auto-Debias, and GuiDebias. We also contribute to future work in the study of gender bias in Maltese by creating evaluation datasets.Our findings highlight the challenges of applying existing bias mitigation methods to linguistically complex languages, underscoring the need for more inclusive approaches in the development of multilingual NLP.
2024
pdf
bib
abs
UM IWSLT 2024 Low-Resource Speech Translation: Combining Maltese and North Levantine Arabic
Sara Nabhani
|
Aiden Williams
|
Miftahul Jannat
|
Kate Rebecca Belcher
|
Melanie Galea
|
Anna Taylor
|
Kurt Micallef
|
Claudia Borg
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)
The IWSLT low-resource track encourages innovation in the field of speech translation, particularly in data-scarce conditions. This paper details our submission for the IWSLT 2024 low-resource track shared task for Maltese-English and North Levantine Arabic-English spoken language translation using an unconstrained pipeline approach. Using language models, we improve ASR performance by correcting the produced output. We present a 2 step approach for MT using data from external sources showing improvements over baseline systems. We also explore transliteration as a means to further augment MT data and exploit the cross-lingual similarities between Maltese and Arabic.
pdf
bib
abs
UOM-Constrained IWSLT 2024 Shared Task Submission - Maltese Speech Translation
Kurt Abela
|
Md Abdur Razzaq Riyadh
|
Melanie Galea
|
Alana Busuttil
|
Roman Kovalev
|
Aiden Williams
|
Claudia Borg
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)
This paper presents our IWSLT-2024 shared task submission on the low-resource track. This submission forms part of the constrained setup; implying limited data for training. Following the introduction, this paper consists of a literature review defining previous approaches to speech translation, as well as their application to Maltese, followed by the defined methodology, evaluation and results, and the conclusion. A cascaded submission on the Maltese to English language pair is presented; consisting of a pipeline containing: a DeepSpeech 1 Automatic Speech Recognition (ASR) system, a KenLM model to optimise the transcriptions, and finally an LSTM machine translation model. The submission achieves a 0.5 BLEU score on the overall test set, and the ASR system achieves a word error rate of 97.15%. Our code is made publicly available.