2024
pdf
abs
BabelBot at AraFinNLP2024: Fine-tuning T5 for Multi-dialect Intent Detection with Synthetic Data and Model Ensembling
Murhaf Fares
|
Samia Touileb
Proceedings of The Second Arabic Natural Language Processing Conference
This paper presents our results for the Arabic Financial NLP (AraFinNLP) shared task at the Second Arabic Natural Language Processing Conference (ArabicNLP 2024). We participated in the first sub-task, Multi-dialect Intent Detection, which focused on cross-dialect intent detection in the banking domain. Our approach involved fine-tuning an encoder-only T5 model, generating synthetic data, and model ensembling. Additionally, we conducted an in-depth analysis of the dataset, addressing annotation errors and problematic translations. Our model was ranked third in the shared task, achieving a F1-score of 0.871.
pdf
abs
AraT5-MSAizer: Translating Dialectal Arabic to MSA
Murhaf Fares
Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024
This paper outlines the process of training the AraT5-MSAizer model, a transformer-based neural machine translation model aimed at translating five regional Arabic dialects into Modern Standard Arabic (MSA). Developed for Task 2 of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools, the model attained a BLEU score of 21.79% on the test set associated with this task.
2018
pdf
abs
Transfer and Multi-Task Learning for Noun–Noun Compound Interpretation
Murhaf Fares
|
Stephan Oepen
|
Erik Velldal
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
In this paper, we empirically evaluate the utility of transfer and multi-task learning on a challenging semantic classification task: semantic interpretation of noun–noun compounds. Through a comprehensive series of experiments and in-depth error analysis, we show that transfer learning via parameter initialization and multi-task learning via parameter sharing can help a neural classification model generalize over a highly skewed distribution of relations. Further, we demonstrate how dual annotation with two distinct sets of relations over the same set of compounds can be exploited to improve the overall accuracy of a neural classifier and its F1 scores on the less frequent, but more difficult relations.
pdf
bib
abs
The 2018 Shared Task on Extrinsic Parser Evaluation: On the Downstream Utility of English Universal Dependency Parsers
Murhaf Fares
|
Stephan Oepen
|
Lilja Øvrelid
|
Jari Björne
|
Richard Johansson
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
We summarize empirical results and tentative conclusions from the Second Extrinsic Parser Evaluation Initiative (EPE 2018). We review the basic task setup, downstream applications involved, and end-to-end results for seventeen participating teams. Based on in-depth quantitative and qualitative analysis, we correlate intrinsic evaluation results at different layers of morph-syntactic analysis with observed downstream behavior.
2017
pdf
Word vectors, reuse, and replicability: Towards a community repository of large-text resources
Murhaf Fares
|
Andrey Kutuzov
|
Stephan Oepen
|
Erik Velldal
Proceedings of the 21st Nordic Conference on Computational Linguistics
2016
pdf
A Dataset for Joint Noun-Noun Compound Bracketing and Interpretation
Murhaf Fares
Proceedings of the ACL 2016 Student Research Workshop