Shadi Tavakoli


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Boosting Sentiment Analysis in Persian through a GAN-Based Synthetic Data Augmentation Method
Masoumeh Mohammadi | Mohammad Ruhul Amin | Shadi Tavakoli
Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script

This paper presents a novel Sentiment Analysis (SA) dataset in the low-resource Persian language including a data augmentation technique using Generative Adversarial Networks (GANs) to generate synthetic data, boosting the volume and variety of data, for achieving state-of-the-art performance. We propose a novel annotated SA dataset, called Senti-Persian, made of 67,743 public comments on movie reviews from Iranian websites (Namava, Filimo and Aparat) and social media (YouTube, Twitter and Instagram). These reviews are labeled with one of the polarity labels, namely positive, negative, and neutral. Our study includes a novel text augmentation model based on GANs. The generator was designed following the linguistic properties of Persian linguistics, while the discriminator was designed based on the cosine similarity of the vectorized original and generated sentences, i.e. using CLS-embedings of BERT. A SA task applied on both collected and augmented datasets for which we observed a significant improvement in the accuracy from 88.4% for the original dataset to the 96% when augmented with synthetic data. Senti-Parsian dataset including the original and the augmented ones will be available on github.