Eisha Halder

2024

pdf bib abs
MULTILATE: A Synthetic Dataset on AI-Generated MULTImodaL hATE Speech
Advaitha Vetagiri | Eisha Halder | Ayanangshu Das Majumder | Partha Pakray | Amitava Das
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

One of the pressing challenges society faces today is the rapid proliferation of online hate speech, exacerbated by the rise of AI-generated multimodal hate content. This new form of synthetically produced hate speech presents unprecedented challenges in detection and moderation. In response to the growing presence of such harmful content across social media platforms, this research introduces a groundbreaking solution:

2023

pdf abs
Multilingual Multimodal Text Detection in Indo-Aryan Languages
Nihar Jyoti Basisth | Eisha Halder | Tushar Sachan | Advaitha Vetagiri | Partha Pakray
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Multi-language text detection and recognition in complex visual scenes is an essential yet challenging task. Traditional pipelines relying on optical character recognition (OCR) often fail to generalize across different languages, fonts, orientations and imaging conditions. This work proposes a novel approach using the YOLOv5 object detection model architecture for multilanguage text detection in images and videos. We curate and annotate a new dataset of over 4,000 scene text images across 4 Indian languages and use specialized data augmentation techniques to improve model robustness. Transfer learning from a base YOLOv5 model pretrained on COCO is combined with tailored optimization strategies for multi-language text detection. Our approach achieves state-of-theart performance, with over 90% accuracy on multi-language text detection across all four languages in our test set. We demonstrate the effectiveness of fine-tuning YOLOv5 for generalized multi-language text extraction across diverse fonts, scales, orientations, and visual contexts. Our approach’s high accuracy and generalizability could enable numerous applications involving multilingual text processing from imagery and video.

Co-authors

Tushar Sachan 1

Venues

icon2