Sauleh Eetemadi

2024

pdf abs
IUST at ClimateActivism 2024: Towards Optimal Stance Detection: A Systematic Study of Architectural Choices and Data Cleaning Techniques
Ghazaleh Mahmoudi | Sauleh Eetemadi
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

This work presents a systematic search of various model architecture configurations and data cleaning methods. The study evaluates the impact of data cleaning methods on the obtained results. Additionally, we demonstrate that a combination of CNN and Encoder-only models such as BERTweet outperforms FNNs. Moreover, by utilizing data augmentation, we are able to overcome the challenge of data imbalance.

pdf abs
BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense
Baktash Ansari | Mohammadmostafa Rostamkhani | Sauleh Eetemadi
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper outlines our approach to SemEval 2024 Task 9, BRAINTEASER: A Novel Task Defying Common Sense. The task aims to evaluate the ability of language models to think creatively. The dataset comprises multi-choice questions that challenge models to think ‘outside of the box’. We fine-tune 2 models, BERT and RoBERTa Large. Next, we employ a Chain of Thought (CoT) zero-shot prompting approach with 6 large language models, such as GPT-3.5, Mixtral, and Llama2. Finally, we utilize ReConcile, a technique that employs a ‘round table conference’ approach with multiple agents for zero-shot learning, to generate consensus answers among 3 selected language models. Our best method achieves an overall accuracy of 85 percent on the sentence puzzles subtask.

pdf abs
eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure
Hoorieh Sabzevari | Mohammadmostafa Rostamkhani | Sauleh Eetemadi
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This study investigates the performance of the zero-shot method in classifying data using three large language models, alongside two models with large input token sizes and the two pre-trained models on legal data. Our main dataset comes from the domain of U.S. civil procedure. It includes summaries of legal cases, specific questions, potential answers, and detailed explanations for why each solution is relevant, all sourced from a book aimed at law students. By comparing different methods, we aimed to understand how effectively they handle the complexities found in legal datasets. Our findings show how well the zero-shot method of large language models can understand complicated data. We achieved our highest F1 score of 64% in these experiments.

pdf abs
ROSHA at SemEval-2024 Task 9: BRAINTEASER A Novel Task Defying Common Sense
Mohammadmostafa Rostamkhani | Shayan Mousavinia | Sauleh Eetemadi
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In our exploration of SemEval 2024 Task 9, specifically the challenging BRAINTEASER: A Novel Task Defying Common Sense, we employed various strategies for the BRAINTEASER QA task, which encompasses both sentence and word puzzles. In the initial approach, we applied the XLM-RoBERTa model both to the original training dataset and concurrently to the original dataset alongside the BiRdQA dataset and the original dataset alongside RiddleSense for comprehensive model training.Another strategy involved expanding each word within our BiRdQA dataset into a full sentence. This unique perspective aimed to enhance the semantic impact of individual words in our training regimen for word puzzle (WP) riddles. Utilizing ChatGPT-3.5, we extended each word into an extensive sentence, applying this process to all options within each riddle.Furthermore, we explored the implementation of RECONCILE (Round-table conference) using three prominent large language models—ChatGPT, Gemini, and the Mixtral-8x7B Large Language Model (LLM). As a final approach, we leveraged GPT-4 results. Remarkably, our most successful experiment yielded noteworthy results, achieving a score of 0.900 for sentence puzzles (S_ori) and 0.906 for word puzzles (W_ori).

pdf abs
IUSTNLPLAB at SemEval-2024 Task 4: Multilingual Detection of Persuasion Techniques in Memes
Mohammad Osoolian | Erfan Moosavi Monazzah | Sauleh Eetemadi
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper outlines our approach to SemEval-2024 Task 4: Multilingual Detection of Persuasion Techniques in Memes, specifically addressing subtask 1. The study focuses on model fine-tuning using language models, including BERT, GPT-2, and RoBERTa, with the experiment results demonstrating optimal performance with GPT-2. Our system submission achieved a competitive ranking of 17th out of 33 teams in subtask 1, showcasing the effectiveness of the employed methodology in the context of persuasive technique identification within meme texts.

pdf abs
IUST-NLPLAB at SemEval-2024 Task 9: BRAINTEASER By MPNet (Sentence Puzzle)
Mohammad Hossein Abbaspour | Erfan Moosavi Monazzah | Sauleh Eetemadi
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This study addresses a task encompassing two distinct subtasks: Sentence-puzzle and Word-puzzle. Our primary focus lies within the Sentence-puzzle subtask, which involves discerning the correct answer from a set of three options for a given riddle constructed from sentence fragments. We propose four distinct methodologies tailored to address this subtask effectively. Firstly, we introduce a zero-shot approach leveraging the capabilities of the GPT-3.5 model. Additionally, we present three fine-tuning methodologies utilizing MPNet as the underlying architecture, each employing a different loss function. We conduct comprehensive evaluations of these methodologies on the designated task dataset and meticulously document the obtained results. Furthermore, we conduct an in-depth analysis to ascertain the respective strengths and weaknesses of each method. Through this analysis, we aim to provide valuable insights into the challenges inherent to this task domain.

2023

pdf abs
GYM at Qur’an QA 2023 Shared Task: Multi-Task Transfer Learning for Quranic Passage Retrieval and Question Answering with Large Language Models
Ghazaleh Mahmoudi | Yeganeh Morshedzadeh | Sauleh Eetemadi
Proceedings of ArabicNLP 2023

This work addresses the challenges of question answering for vintage texts like the Quran. It introduces two tasks: passage retrieval and reading comprehension. For passage retrieval, it employs unsupervised fine-tuning sentence encoders and supervised multi-task learning. In reading comprehension, it fine-tunes an Electra-based model, demonstrating significant improvements over baseline models. Our best AraElectra model achieves 46.1% partial Average Precision (pAP) on the unseen test set, outperforming the baseline by 23%.

pdf abs
IUST at ImageArg: The First Shared Task in Multimodal Argument Mining
Melika Nobakhtian | Ghazal Zamaninejad | Erfan Moosavi Monazzah | Sauleh Eetemadi
Proceedings of the 10th Workshop on Argument Mining

ImageArg is a shared task at the 10th ArgMining Workshop at EMNLP 2023. It leverages the ImageArg dataset to advance multimodal persuasiveness techniques. This challenge comprises two distinct subtasks: 1) Argumentative Stance (AS) Classification: Assessing whether a given tweet adopts an argumentative stance. 2) Image Persuasiveness (IP) Classification: Determining if the tweet image enhances the persuasive quality of the tweet. We conducted various experiments on both subtasks and ranked sixth out of the nine participating teams.

pdf abs
A longitudinal study about gradual changes in the Iranian Online Public Sphere pre and post of ‘Mahsa Moment’: Focusing on Twitter
Sadegh Jafari | Amin Fathi | Abolfazl Hajizadegan | Amirmohammad Kazemeini | Sauleh Eetemadi
Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change

Mahsa Amini’s death shocked Iranian society. The effects of this event and the subsequent tragedies in Iran not only in realspace but also in cyberspace, including Twitter, were tremendous and unimaginable. We explore how Twitter has changed after Mahsa Amini’s death by analyzing the sentiments of Iranian users in the 90 days after this event. Additionally, we track the change in word meaning and each word’s neighboring words. Finally, we use word clustering methods for topic modeling.

pdf abs
PMCoders at SemEval-2023 Task 1: RAltCLIP: Use Relative AltCLIP Features to Rank
Mohammad Javad Pirhadi | Motahhare Mirzaei | Mohammad Reza Mohammadi | Sauleh Eetemadi
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Visual Word Sense Disambiguation (VWSD) task aims to find the most related image among 10 images to an ambiguous word in some limited textual context. In this work, we use AltCLIP features and a 3-layer standard transformer encoder to compare the cosine similarity between the given phrase and different images. Also, we improve our model’s generalization by using a subset of LAION-5B. The best official baseline achieves 37.20% and 54.39% macro-averaged hit rate and MRR (Mean Reciprocal Rank) respectively. Our best configuration reaches 39.61% and 56.78% macro-averaged hit rate and MRR respectively. The code will be made publicly available on GitHub.

pdf abs
ROZAM at SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis
Mohammadmostafa Rostamkhani | Ghazal Zamaninejad | Sauleh Eetemadi
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

We build a model using large multilingual pretrained language model XLM-T for regression task and fine-tune it on the MINT (Multilingual INTmacy) analysis dataset which covers 6 languages for training and 4 languages for testing zero-shot performance of the model. The dataset was annotated and the annotations are intimacy scores. We experiment with several deep learning architectures to predict intimacy score. To achieve optimal performance we modify several model settings including loss function, number and type of layers. In total, we ran 16 end-to-end experiments. Our best system achieved a Pearson Correlation score of 0.52.

pdf abs
Prodicus at SemEval-2023 Task 4: Enhancing Human Value Detection with Data Augmentation and Fine-Tuned Language Models
Erfan Moosavi Monazzah | Sauleh Eetemadi
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper introduces a data augmentation technique for the task of detecting human values. Our approach involves generating additional examples using metadata that describes the labels in the datasets. We evaluated the effectiveness of our method by fine-tuning BERT and RoBERTa models on our augmented dataset and comparing their F1 -scores to those of the non-augmented dataset. We obtained competitive results on both the Main test set and the Nahj al-Balagha test set, ranking 14th and 7th respectively among the participants. We also demonstrate that by incorporating our augmentation technique, the classification performance of BERT and RoBERTa is improved, resulting in an increase of up to 10.1% in their F1-score.

2022

pdf abs
Using Two Losses and Two Datasets Simultaneously to Improve TempoWiC Accuracy
Mohammad Javad Pirhadi | Motahhare Mirzaei | Sauleh Eetemadi
Proceedings of the First Workshop on Ever Evolving NLP (EvoNLP)

WSD (Word Sense Disambiguation) is the task of identifying which sense of a word is meant in a sentence or other segment of text. Researchers have worked on this task (e.g. Pustejovsky, 2002) for years but it’s still a challenging one even for SOTA (state-of-the-art) LMs (language models). The new dataset, TempoWiC introduced by Loureiro et al. (2022b) focuses on the fact that words change over time. Their best baseline achieves 70.33% macro-F1. In this work, we use two different losses simultaneously. We also improve our model by using another similar dataset to generalize better. Our best configuration beats their best baseline by 4.23%.

pdf abs
Pars-ABSA: a Manually Annotated Aspect-based Sentiment Analysis Benchmark on Farsi Product Reviews
Taha Shangipour ataei | Kamyar Darvishi | Soroush Javdan | Behrouz Minaei-Bidgoli | Sauleh Eetemadi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Due to the increased availability of online reviews, sentiment analysis witnessed a thriving interest from researchers. Sentiment analysis is a computational treatment of sentiment used to extract and understand the opinions of authors. While many systems were built to predict the sentiment of a document or a sentence, many others provide the necessary detail on various aspects of the entity (i.e., aspect-based sentiment analysis). Most of the available data resources were tailored to English and the other popular European languages. Although Farsi is a language with more than 110 million speakers, to the best of our knowledge, there is a lack of proper public datasets on aspect-based sentiment analysis for Farsi. This paper provides a manually annotated Farsi dataset, Pars-ABSA, annotated and verified by three native Farsi speakers. The dataset consists of 5,114 positive, 3,061 negative and 1,827 neutral data samples from 5,602 unique reviews. Moreover, as a baseline, this paper reports the performance of some aspect-based sentiment analysis methods focusing on transfer learning on Pars-ABSA.

2021

Training and evaluation of automatic fact extraction and verification techniques require large amounts of annotated data which might not be available for low-resource languages. This paper presents ParsFEVER: the first publicly available Farsi dataset for fact extraction and verification. We adopt the construction procedure of the standard English dataset for the task, i.e., FEVER, and improve it for the case of low-resource languages. Specifically, claims are extracted from sentences that are carefully selected to be more informative. The dataset comprises nearly 23K manually-annotated claims. Over 65% of the claims in ParsFEVER are many-hop (require evidence from multiple sources), making the dataset a challenging benchmark (only 13% of the claims in FEVER are many-hop). Also, despite having a smaller training set (around one-ninth of that in Fever), a model trained on ParsFEVER attains similar downstream performance, indicating the quality of the dataset. We release the dataset and the annotation guidelines at https://github.com/Zarharan/ParsFEVER.

Sauleh Eetemadi

2024

2023

2022

2021

2015

2014

2013

Co-authors

Venues