2024
pdf
abs
SemanticCuetSync at AraFinNLP2024: Classification of Cross-Dialect Intent in the Banking Domain using Transformers
Ashraful Paran
|
Symom Shohan
|
Md. Hossain
|
Jawad Hossain
|
Shawly Ahsan
|
Mohammed Moshiul Hoque
Proceedings of The Second Arabic Natural Language Processing Conference
Intention detection is a crucial aspect of natural language understanding (NLU), focusing on identifying the primary objective underlying user input. In this work, we present a transformer-based method that excels in determining the intent of Arabic text within the banking domain. We explored several machine learning (ML), deep learning (DL), and transformer-based models on an Arabic banking dataset for intent detection. Our findings underscore the challenges that traditional ML and DL models face in understanding the nuances of various Arabic dialects, leading to subpar performance in intent detection. However, the transformer-based methods, designed to tackle such complexities, significantly outperformed the other models in classifying intent across different Arabic dialects. Notably, the AraBERTv2 model achieved the highest micro F1 score of 82.08% in ArBanking77 dataset, a testament to its effectiveness in this context. This achievement, which contributed to our work being ranked 5th in the shared task, AraFinNLP2024, highlights the importance of developing models that can effectively handle the intricacies of Arabic language processing and intent detection.
pdf
abs
SemanticCuetSync at ArAIEval Shared Task: Detecting Propagandistic Spans with Persuasion Techniques Identification using Pre-trained Transformers
Symom Shohan
|
Md. Hossain
|
Ashraful Paran
|
Shawly Ahsan
|
Jawad Hossain
|
Mohammed Moshiul Hoque
Proceedings of The Second Arabic Natural Language Processing Conference
Detecting propagandistic spans and identifying persuasion techniques are crucial for promoting informed decision-making, safeguarding democratic processes, and fostering a media environment characterized by integrity and transparency. Various machine learning (Logistic Regression, Random Forest, and Multinomial Naive Bayes), deep learning (CNN, CNN+LSTM, CNN+BiLSTM), and transformer-based (AraBERTv2, AraBERT-NER, CamelBERT, BERT-Base-Arabic) models were exploited to perform the task. The evaluation results indicate that CamelBERT achieved the highest micro-F1 score (24.09%), outperforming CNN+LSTM and AraBERTv2. The study found that most models struggle to detect propagandistic spans when multiple spans are present within the same article. Overall, the model’s performance secured a 6th place ranking in the ArAIEval Shared Task-1.
2023
pdf
abs
Interpreting Indirect Answers to Yes-No Questions in Multiple Languages
Zijie Wang
|
Md Hossain
|
Shivam Mathur
|
Terry Melo
|
Kadir Ozler
|
Keun Park
|
Jacob Quintero
|
MohammadHossein Rezaei
|
Shreya Shakya
|
Md Uddin
|
Eduardo Blanco
Findings of the Association for Computational Linguistics: EMNLP 2023
Yes-no questions expect a yes or no for an answer, but people often skip polar keywords. Instead, they answer with long explanations that must be interpreted. In this paper, we focus on this challenging problem and release new benchmarks in eight languages. We present a distant supervision approach to collect training data, and demonstrate that direct answers (i.e., with polar keywords) are useful to train models to interpret indirect answers (i.e., without polar keywords). We show that monolingual fine-tuning is beneficial if training data can be obtained via distant supervision for the language of interest (5 languages). Additionally, we show that cross-lingual fine-tuning is always beneficial (8 languages).