Noel Crespi

2025

pdf bib abs
Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning
Aditya Narayan Sankaran | Reza Farahbakhsh | Noel Crespi
Proceedings of the 31st International Conference on Computational Linguistics

Online abusive content detection, particularly in low-resource settings and within the audio modality, remains underexplored. We investigate the potential of pre-trained audio representations for detecting abusive language in low-resource languages, in this case, in Indian languages using Few Shot Learning (FSL). Leveraging powerful representations from models such as Wav2Vec and Whisper, we explore cross-lingual abuse detection using the ADIMA dataset with FSL. Our approach integrates these representations within the Model-Agnostic Meta-Learning (MAML) framework to classify abusive language in 10 languages. We experiment with various shot sizes (50-200) evaluating the impact of limited data on performance. Additionally, a feature visualization study was conducted to better understand model behaviour. This study highlights the generalization ability of pre-trained models in low-resource scenarios and offers valuable insights into detecting abusive language in multilingual contexts.

pdf bib abs
Aligning Sentence Simplification with ESL Learner’s Proficiency for Language Acquisition
Guanlin Li | Yuki Arase | Noel Crespi
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Text simplification is crucial for improving accessibility and comprehension for English as a Second Language (ESL) learners. This study goes a step further and aims to facilitate ESL learners’ language acquisition by simplification. Specifically, we propose simplifying complex sentences to appropriate levels for learners while also increasing vocabulary coverage of the target level in the simplifications. We achieve this without a parallel corpus by conducting reinforcement learning on a large language model. Our method employs token-level and sentence-level rewards, and iteratively trains the model on its self-generated outputs to guide the model to search for simplification hypotheses that satisfy the target attributes. Experiment results on CEFR-SP and TurkCorpus datasets show that the proposed method can effectively increase the frequency and diversity of vocabulary of the target level by more than 20% compared to baseline models, while maintaining high simplification quality.

2024

pdf bib abs
Improving Cross-lingual Transfer with Contrastive Negative Learning and Self-training
Guanlin Li | Xuechen Zhao | Amir Jafari | Wenhao Shao | Reza Farahbakhsh | Noel Crespi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Recent studies improve the cross-lingual transfer learning by better aligning the internal representations within the multilingual model or exploring the information of the target language using self-training. However, the alignment-based methods exhibit intrinsic limitations such as non-transferable linguistic elements, while most of the self-training based methods ignore the useful information hidden in the low-confidence samples. To address this issue, we propose CoNLST (Contrastive Negative Learning and Self-Training) to leverage the information of low-confidence samples. Specifically, we extend the negative learning to the metric space by selecting negative pairs based on the complementary labels and then employ self-training to iteratively train the model to converge on the obtained clean pseudo-labels. We evaluate our approach on the widely-adopted cross-lingual benchmark XNLI. The experiment results show that our method improves upon the baseline models and can serve as a beneficial complement to the alignment-based methods.

2023

pdf bib abs
No offence, Bert - I insult only humans! Multilingual sentence-level attack on toxicity detection networks
Sergey Berezin | Reza Farahbakhsh | Noel Crespi
Findings of the Association for Computational Linguistics: EMNLP 2023

We introduce a simple yet efficient sentence-level attack on black-box toxicity detector models. By adding several positive words or sentences to the end of a hateful message, we are able to change the prediction of a neural network and pass the toxicity detection system check. This approach is shown to be working on seven languages from three different language families. We also describe the defence mechanism against the aforementioned attack and discuss its limitations.

Co-authors

Aditya Narayan Sankaran 1

Wenhao Shao 1

Xuechen Zhao 1

Venues

Fix data