Annajiat Alim Rasel
2026
Introducing a Bangla Sentence – Gloss Pair Dataset for Bangla Sign Language Translation and Research
Neelavro Saha | Rafi Shahriyar | Nafis Ashraf Roudra | Saadman Sakib | Annajiat Alim Rasel
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Neelavro Saha | Rafi Shahriyar | Nafis Ashraf Roudra | Saadman Sakib | Annajiat Alim Rasel
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Bangla Sign Language (BdSL) translation represents a low-resource NLP task due to the lack of large-scale datasets that address sentence-level translation. Correspondingly, existing research in this field has been limited to word and alphabet level detection. In this work, we introduce Bangla-SGP, a novel parallel dataset consisting of 1,000 human-annotated sentence–gloss pairs which was augmented with around 3,000 synthetically generated pairs using syntactic and morphological rules through a rule-based Retrieval-Augmented Generation (RAG) pipeline. The gloss sequences of the spoken Bangla sentences are made up of individual glosses which are Bangla sign supported words and serve as an intermediate representation for a continuous sign. Our dataset consists of 1000 high quality Bangla sentences that are manually annotated into a gloss sequence by a professional signer. The augmentation process incorporates rule-based linguistic strategies and prompt engineering techniques that we have adopted by critically analyzing our human annotated sentence-gloss pairs and by working closely with our professional signer. Furthermore, we fine-tune several transformer-based models such as mBart50, Google mT5, GPT4.1-nano and evaluate their sentence-to-gloss translation performance using BLEU scores, based on these evaluation metrics we compare the model’s gloss-translation consistency across our dataset and the RWTH-PHOENIX-2014T benchmark.
2023
BanglaClickBERT: Bangla Clickbait Detection from News Headlines using Domain Adaptive BanglaBERT and MLP Techniques
Saman Sarker Joy | Tanusree Das Aishi | Naima Tahsin Nodi | Annajiat Alim Rasel
Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association
Saman Sarker Joy | Tanusree Das Aishi | Naima Tahsin Nodi | Annajiat Alim Rasel
Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association
News headlines or titles that deliberately persuade readers to view a particular online content are referred to as clickbait. There have been numerous studies focused on clickbait detection in English language, compared to that, there have been very few researches carried out that address clickbait detection in Bangla news headlines. In this study, we have experimented with several distinctive transformers models, namely BanglaBERT and XLM-RoBERTa. Additionally, we introduced a domain-adaptive pretrained model, BanglaClickBERT. We conducted a series of experiments to identify the most effective model. The dataset we used for this study contained 15,056 labeled and 65,406 unlabeled news headlines; in addition to that, we have collected more unlabeled Bangla news headlines by scraping clickbait-dense websites making a total of 1 million unlabeled news headlines in order to make our BanglaClickBERT. Our approach has successfully surpassed the performance of existing state-of-the-art technologies providing a more accurate and efficient solution for detecting clickbait in Bangla news headlines, with potential implications for improving online content quality and user experience.