2023
pdf
My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks
Tanmay Chavan
|
Omkar Gokhale
|
Aditya Kane
|
Shantanu Patankar
|
Raviraj Joshi
Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings)
pdf
abs
Converge at WASSA 2023 Empathy, Emotion and Personality Shared Task: A Transformer-based Approach for Multi-Label Emotion Classification
Aditya Paranjape
|
Gaurav Kolhatkar
|
Yash Patwardhan
|
Omkar Gokhale
|
Shweta Dharmadhikari
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis
In this paper, we highlight our approach for the “WASSA 2023 Shared-Task 1: Empathy Detection and Emotion Classification”. By accurately identifying emotions from textual sources of data, deep learning models can be trained to understand and interpret human emotions more effectively. The classification of emotions facilitates the creation of more emotionally intelligent systems that can better understand and respond to human emotions. We compared multiple transformer-based models for multi-label classification. Ensembling and oversampling were used to improve the performance of the system. A threshold-based voting mechanism performed on three models (Longformer, BERT, BigBird) yields the highest overall macro F1-score of 0.6605.
pdf
abs
Team Converge at ProbSum 2023: Abstractive Text Summarization of Patient Progress Notes
Gaurav Kolhatkar
|
Aditya Paranjape
|
Omkar Gokhale
|
Dipali Kadam
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks
In this paper, we elaborate on our approach for the shared task 1A issued by BioNLP Workshop 2023 titled Problem List Summarization. With an increase in the digitization of health records, a need arises for quick and precise summarization of large amounts of records. With the help of summarization, medical professionals can sieve through multiple records in a short span of time without overlooking any crucial point. We use abstractive text summarization for this task and experiment with multiple state-of-the-art models like Pegasus, BART, and T5, along with various pre-processing and data augmentation techniques to generate summaries from patients’ progress notes. For this task, the metric used was the ROUGE-L score. From our experiments, we conclude that Pegasus is the best-performing model on the dataset, achieving a ROUGE-L F1 score of 0.2744 on the test dataset (3rd rank on the leaderboard).
2022
pdf
abs
Optimize_Prime@DravidianLangTech-ACL2022: Emotion Analysis in Tamil
Omkar Gokhale
|
Shantanu Patankar
|
Onkar Litake
|
Aditya Mandke
|
Dipali Kadam
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
This paper aims to perform an emotion analysis of social media comments in Tamil. Emotion analysis is the process of identifying the emotional context of the text. In this paper, we present the findings obtained by Team Optimize_Prime in the ACL 2022 shared task “Emotion Analysis in Tamil.” The task aimed to classify social media comments into categories of emotion like Joy, Anger, Trust, Disgust, etc. The task was further divided into two subtasks, one with 11 broad categories of emotions and the other with 31 specific categories of emotion. We implemented three different approaches to tackle this problem: transformer-based models, Recurrent Neural Networks (RNNs), and Ensemble models. XLM-RoBERTa performed the best on the first task with a macro-averaged f1 score of 0.27, while MuRIL provided the best results on the second task with a macro-averaged f1 score of 0.13.
pdf
abs
Optimize_Prime@DravidianLangTech-ACL2022: Abusive Comment Detection in Tamil
Shantanu Patankar
|
Omkar Gokhale
|
Onkar Litake
|
Aditya Mandke
|
Dipali Kadam
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
This paper tries to address the problem of abusive comment detection in low-resource indic languages. Abusive comments are statements that are offensive to a person or a group of people. These comments are targeted toward individuals belonging to specific ethnicities, genders, caste, race, sexuality, etc. Abusive Comment Detection is a significant problem, especially with the recent rise in social media users. This paper presents the approach used by our team — Optimize_Prime, in the ACL 2022 shared task “Abusive Comment Detection in Tamil.” This task detects and classifies YouTube comments in Tamil and Tamil-English Codemixed format into multiple categories. We have used three methods to optimize our results: Ensemble models, Recurrent Neural Networks, and Transformers. In the Tamil data, MuRIL and XLM-RoBERTA were our best performing models with a macro-averaged f1 score of 0.43. Furthermore, for the Code-mixed data, MuRIL and M-BERT provided sublime results, with a macro-averaged f1 score of 0.45.
pdf
bib
To Train or Not to Train: Predicting the Performance of Massively Multilingual Models
Shantanu Patankar
|
Omkar Gokhale
|
Onkar Litake
|
Aditya Mandke
|
Dipali Kadam
Proceedings of the First Workshop on Scaling Up Multilingual Evaluation