Savita Bhat


2023

pdf
Hiding in Plain Sight: Insights into Abstractive Text Summarization
Vivek Srivastava | Savita Bhat | Niranjan Pedanekar
Proceedings of the Fourth Workshop on Insights from Negative Results in NLP

In recent years, there has been growing interest in the field of abstractive text summarization with focused contributions in relevant model architectures, datasets, and evaluation metrics. Despite notable research advances, previous works have identified certain limitations concerning the quality of datasets and the effectiveness of evaluation techniques for generated summaries. In this context, we examine these limitations further with the help of three quality measures, namely, Information Coverage, Entity Hallucination, and Summarization Complexity. As a part of this work, we investigate two widely used datasets (XSUM and CNNDM) and three existing models (BART, PEGASUS, and BRIO) and report our findings. Some key insights are: 1) Cumulative ROUGE score is an inappropriate evaluation measure since few high-scoring samples dominate the overall performance, 2) Existing summarization models have limited capability for information coverage and hallucinate to generate factual information, and 3) Compared to the model generated summaries, the reference summaries have lowest information coverage and highest entity hallucinations reiterating the need of new and better reference summaries.

pdf
Large Language Models As Annotators: A Preliminary Evaluation For Annotating Low-Resource Language Content
Savita Bhat | Vasudeva Varma
Proceedings of the 4th Workshop on Evaluation and Comparison of NLP Systems

The process of collecting human-generated annotations is time-consuming and resource-hungry. In the case of low-resource (LR) languages such as Indic languages, these efforts are more expensive due to the dearth of data and human experts. Considering their importance in solving downstream applications, there have been concentrated efforts exploring alternatives for human-generated annotations. To that extent, we seek to evaluate multilingual large language models (LLMs) for their potential to substitute or aid human-generated annotation efforts. We use LLMs to re-label publicly available datasets in LR languages for the tasks of natural language inference, sentiment analysis, and news classification. We compare these annotations with existing ground truth labels to analyze the efficacy of using LLMs for annotation tasks. We observe that the performance of these LLMs varies substantially across different tasks and languages. The results show that off-the-shelf use of multilingual LLMs is not appropriate and results in poor performance in two of the three tasks.

pdf
Generative Models For Indic Languages: Evaluating Content Generation Capabilities
Savita Bhat | Vasudeva Varma | Niranjan Pedanekar
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Large language models (LLMs) and generative AI have emerged as the most important areas in the field of natural language processing (NLP). LLMs are considered to be a key component in several NLP tasks, such as summarization, question-answering, sentiment classification, and translation. Newer LLMs, such as ChatGPT, BLOOMZ, and several such variants, are known to train on multilingual training data and hence are expected to process and generate text in multiple languages. Considering the widespread use of LLMs, evaluating their efficacy in multilingual settings is imperative. In this work, we evaluate the newest generative models (ChatGPT, mT0, and BLOOMZ) in the context of Indic languages. Specifically, we consider natural language generation (NLG) applications such as summarization and question-answering in monolingual and cross-lingual settings. We observe that current generative models have limited capability for generating text in Indic languages in a zero-shot setting. In contrast, generative models perform consistently better on manual quality-based evaluation in both Indic languages and English language generation. Considering limited generation performance, we argue that these LLMs are not intended to use in zero-shot fashion in downstream applications.

2021

pdf
What BERTs and GPTs know about your brand? Probing contextual language models for affect associations
Vivek Srivastava | Stephen Pilli | Savita Bhat | Niranjan Pedanekar | Shirish Karande
Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

Investigating brand perception is fundamental to marketing strategies. In this regard, brand image, defined by a set of attributes (Aaker, 1997), is recognized as a key element in indicating how a brand is perceived by various stakeholders such as consumers and competitors. Traditional approaches (e.g., surveys) to monitor brand perceptions are time-consuming and inefficient. In the era of digital marketing, both brand managers and consumers engage with a vast amount of digital marketing content. The exponential growth of digital content has propelled the emergence of pre-trained language models such as BERT and GPT as essential tools in solving myriads of challenges with textual data. This paper seeks to investigate the extent of brand perceptions (i.e., brand and image attribute associations) these language models encode. We believe that any kind of bias for a brand and attribute pair may influence customer-centric downstream tasks such as recommender systems, sentiment analysis, and question-answering, e.g., suggesting a specific brand consistently when queried for innovative products. We use synthetic data and real-life data and report comparison results for five contextual LMs, viz. BERT, RoBERTa, DistilBERT, ALBERT and BART.

2018

pdf
EmotionX-Area66: Predicting Emotions in Dialogues using Hierarchical Attention Network with Sequence Labeling
Rohit Saxena | Savita Bhat | Niranjan Pedanekar
Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media

This paper presents our system submitted to the EmotionX challenge. It is an emotion detection task on dialogues in the EmotionLines dataset. We formulate this as a hierarchical network where network learns data representation at both utterance level and dialogue level. Our model is inspired by Hierarchical Attention network (HAN) and uses pre-trained word embeddings as features. We formulate emotion detection in dialogues as a sequence labeling problem to capture the dependencies among labels. We report the performance accuracy for four emotions (anger, joy, neutral and sadness). The model achieved unweighted accuracy of 55.38% on Friends test dataset and 56.73% on EmotionPush test dataset. We report an improvement of 22.51% in Friends dataset and 36.04% in EmotionPush dataset over baseline results.