Sanjay Singh


2022

pdf
MaNLP@SMM4H’22: BERT for Classification of Twitter Posts
Keshav Kapur | Rajitha Harikrishnan | Sanjay Singh
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

The reported work is our straightforward approach for the shared task “Classification of tweets self-reporting age” organized by the “Social Media Mining for Health Applications (SMM4H)” workshop. This literature describes the approach that was used to build a binary classification system, that classifies the tweets related to birthday posts into two classes namely, exact age(positive class) and non-exact age(negative class). We made two submissions with variations in the preprocessing of text which yielded F1 scores of 0.80 and 0.81 when evaluated by the organizers.

2021

pdf
Evaluating Gender Bias in Hindi-English Machine Translation
Krithika Ramesh | Gauri Gupta | Sanjay Singh
Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing

With language models being deployed increasingly in the real world, it is essential to address the issue of the fairness of their outputs. The word embedding representations of these language models often implicitly draw unwanted associations that form a social bias within the model. The nature of gendered languages like Hindi, poses an additional problem to the quantification and mitigation of bias, owing to the change in the form of the words in the sentence, based on the gender of the subject. Additionally, there is sparse work done in the realm of measuring and debiasing systems for Indic languages. In our work, we attempt to evaluate and quantify the gender bias within a Hindi-English machine translation system. We implement a modified version of the existing TGBI metric based on the grammatical considerations for Hindi. We also compare and contrast the resulting bias measurements across multiple metrics for pre-trained embeddings and the ones learned by our machine translation model.