Vasudeva Varma


2022

pdf
Leveraging Mental Health Forums for User-level Depression Detection on Social Media
Sravani Boinepelli | Tathagata Raha | Harika Abburi | Pulkit Parikh | Niyati Chhaya | Vasudeva Varma
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The number of depression and suicide risk cases on social media platforms is ever-increasing, and the lack of depression detection mechanisms on these platforms is becoming increasingly apparent. A majority of work in this area has focused on leveraging linguistic features while dealing with small-scale datasets. However, one faces many obstacles when factoring into account the vastness and inherent imbalance of social media content. In this paper, we aim to optimize the performance of user-level depression classification to lessen the burden on computational resources. The resulting system executes in a quicker, more efficient manner, in turn making it suitable for deployment. To simulate a platform agnostic framework, we simultaneously replicate the size and composition of social media to identify victims of depression. We systematically design a solution that categorizes post embeddings, obtained by fine-tuning transformer models such as RoBERTa, and derives user-level representations using hierarchical attention networks. We also introduce a novel mental health dataset to enhance the performance of depression categorization. We leverage accounts of depression taken from this dataset to infuse domain-specific elements into our framework. Our proposed methods outperform numerous baselines across standard metrics for the task of depression detection in text.

pdf
An Ensemble Approach to Detect Emotions at an Essay Level
Himanshu Maheshwari | Vasudeva Varma
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis

This paper describes our system (IREL, reffered as himanshu.1007 on Codalab) for Shared Task on Empathy Detection, Emotion Classification, and Personality Detection at 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis at ACL 2022. We participated in track 2 for predicting emotion at the essay level. We propose an ensemble approach that leverages the linguistic knowledge of the RoBERTa, BART-large, and RoBERTa model finetuned on the GoEmotions dataset. Each brings in its unique advantage, as we discuss in the paper. Our proposed system achieved a Macro F1 score of 0.585 and ranked one out of thirteen teams

pdf
IIITH at SemEval-2022 Task 5: A comparative study of deep learning models for identifying misogynous memes
Tathagata Raha | Sagar Joshi | Vasudeva Varma
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper provides a comparison of different deep learning methods for identifying misogynous memes for SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification. In this task, we experiment with architectures in the identification of misogynous content in memes by making use of text and image-based information. The different deep learning methods compared in this paper are: (i) unimodal image or text models (ii) fusion of unimodal models (iii) multimodal transformers models and (iv) transformers further pretrained on a multimodal task. From our experiments, we found pretrained multimodal transformer architectures to strongly outperform the models involving the fusion of representation from both the modalities.

pdf
IIIT-MLNS at SemEval-2022 Task 8: Siamese Architecture for Modeling Multilingual News Similarity
Sagar Joshi | Dhaval Taunk | Vasudeva Varma
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

The task of multilingual news article similarity entails determining the degree of similarity of a given pair of news articles in a language-agnostic setting. This task aims to determine the extent to which the articles deal with the entities and events in question without much consideration of the subjective aspects of the discourse. Considering the superior representations being given by these models as validated on other tasks in NLP across an array of high and low-resource languages and this task not having any restricted set of languages to focus on, we adopted using the encoder representations from these models as our choice throughout our experiments. For modeling the similarity task by using the representations given by these models, a Siamese architecture was used as the underlying architecture. In experimentation, we investigated on several fronts including features passed to the encoder model, data augmentation and ensembling among our major experiments. We found data augmentation to be the most effective working strategy among our experiments.

pdf
Towards Capturing Changes in Mood and Identifying Suicidality Risk
Sravani Boinepelli | Shivansh Subramanian | Abhijeeth Singam | Tathagata Raha | Vasudeva Varma
Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology

This paper describes our systems for CLPsych?s 2022 Shared Task. Subtask A involves capturing moments of change in an individual?s mood over time, while Subtask B asked us to identify the suicidality risk of a user. We explore multiple machine learning and deep learning methods for the same, taking real-life applicability into account while considering the design of the architecture. Our team achieved top results in different categories for both subtasks. Task A was evaluated on a post-level (using macro averaged F1) and on a window-based timeline level (using macro-averaged precision and recall). We scored a post-level F1 of 0.520 and ranked second with a timeline-level recall of 0.646. Task B was a user-level task where we also came in second with a micro F1 of 0.520 and scored third place on the leaderboard with a macro F1 of 0.380.

2021

pdf
IIITH at SemEval-2021 Task 7: Leveraging transformer-based humourous and offensive text detection architectures using lexical and hurtlex features and task adaptive pretraining
Tathagata Raha | Ishan Sanjeev Upadhyay | Radhika Mamidi | Vasudeva Varma
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes our approach (IIITH) for SemEval-2021 Task 5: HaHackathon: Detecting and Rating Humor and Offense. Our results focus on two major objectives: (i) Effect of task adaptive pretraining on the performance of transformer based models (ii) How does lexical and hurtlex features help in quantifying humour and offense. In this paper, we provide a detailed description of our approach along with comparisions mentioned above.

pdf
SciBERT Sentence Representation for Citation Context Classification
Himanshu Maheshwari | Bhavyajeet Singh | Vasudeva Varma
Proceedings of the Second Workshop on Scholarly Document Processing

This paper describes our system (IREL) for 3C-Citation Context Classification shared task of the Scholarly Document Processing Workshop at NAACL 2021. We participated in both subtask A and subtask B. Our best system achieved a Macro F1 score of 0.26973 on the private leaderboard for subtask A and was ranked one. For subtask B our best system achieved a Macro F1 score of 0.59071 on the private leaderboard and was ranked two. We used similar models for both the subtasks with some minor changes, as discussed in this paper. Our best performing model for both the subtask was a finetuned SciBert model followed by a linear layer. This paper provides a detailed description of all the approaches we tried and their results.

pdf
Cross-lingual Alignment of Knowledge Graph Triples with Sentences
Swayatta Daw | Shivprasad Sagare | Tushar Abhishek | Vikram Pudi | Vasudeva Varma
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

The pairing of natural language sentences with knowledge graph triples is essential for many downstream tasks like data-to-text generation, facts extraction from sentences (semantic parsing), knowledge graph completion, etc. Most existing methods solve these downstream tasks using neural-based end-to-end approaches that require a large amount of well-aligned training data, which is difficult and expensive to acquire. Recently various unsupervised techniques have been proposed to alleviate this alignment step by automatically pairing the structured data (knowledge graph triples) with textual data. However, these approaches are not well suited for low resource languages that provide two major challenges: (1) unavailability of pair of triples and native text with the same content distribution and (2) limited Natural language Processing (NLP) resources. In this paper, we address the unsupervised pairing of knowledge graph triples with sentences for low resource languages, selecting Hindi as the low resource language. We propose cross-lingual pairing of English triples with Hindi sentences to mitigate the unavailability of content overlap. We propose two novel approaches: NER-based filtering with Semantic Similarity and Key-phrase Extraction with Relevance Ranking. We use our best method to create a collection of 29224 well-aligned English triples and Hindi sentence pairs. Additionally, we have also curated 350 human-annotated golden test datasets for evaluation. We make the code and dataset publicly available.

2020

pdf
Summaformers @ LaySumm 20, LongSumm 20
Sayar Ghosh Roy | Nikhil Pinnaparaju | Risubh Jain | Manish Gupta | Vasudeva Varma
Proceedings of the First Workshop on Scholarly Document Processing

Automatic text summarization has been widely studied as an important task in natural language processing. Traditionally, various feature engineering and machine learning based systems have been proposed for extractive as well as abstractive text summarization. Recently, deep learning based, specifically Transformer-based systems have been immensely popular. Summarization is a cognitively challenging task – extracting summary worthy sentences is laborious, and expressing semantics in brief when doing abstractive summarization is complicated. In this paper, we specifically look at the problem of summarizing scientific research papers from multiple domains. We differentiate between two types of summaries, namely, (a) LaySumm: A very short summary that captures the essence of the research paper in layman terms restricting overtly specific technical jargon and (b) LongSumm: A much longer detailed summary aimed at providing specific insights into various ideas touched upon in the paper. While leveraging latest Transformer-based models, our systems are simple, intuitive and based on how specific paper sections contribute to human summaries of the two types described above. Evaluations against gold standard summaries using ROUGE metrics prove the effectiveness of our approach. On blind test corpora, our system ranks first and third for the LongSumm and LaySumm tasks respectively.

pdf
Predicting Clickbait Strength in Online Social Media
Vijayasaradhi Indurthi | Bakhtiyar Syed | Manish Gupta | Vasudeva Varma
Proceedings of the 28th International Conference on Computational Linguistics

Hoping for a large number of clicks and potentially high social shares, journalists of various news media outlets publish sensationalist headlines on social media. These headlines lure the readers to click on them and satisfy the curiosity gap in their mind. Low quality material pointed to by clickbaits leads to time wastage and annoyance for users. Even for enterprises publishing clickbaits, it hurts more than it helps as it erodes user trust, attracts wrong visitors, and produces negative signals for ranking algorithms. Hence, identifying and flagging clickbait titles is very essential. Previous work on clickbaits has majorly focused on binary classification of clickbait titles. However not all clickbaits are equally clickbaity. It is not only essential to identify a click-bait, but also to identify the intensity of the clickbait based on the strength of the clickbait. In this work, we model clickbait strength prediction as a regression problem. While previous methods have relied on traditional machine learning or vanilla recurrent neural networks, we rigorously investigate the use of transformers for clickbait strength prediction. On a benchmark dataset with ∼39K posts, our methods outperform all the existing methods in the Clickbait Challenge.

pdf
Semi-supervised Multi-task Learning for Multi-label Fine-grained Sexism Classification
Harika Abburi | Pulkit Parikh | Niyati Chhaya | Vasudeva Varma
Proceedings of the 28th International Conference on Computational Linguistics

Sexism, a form of oppression based on one’s sex, manifests itself in numerous ways and causes enormous suffering. In view of the growing number of experiences of sexism reported online, categorizing these recollections automatically can assist the fight against sexism, as it can facilitate effective analyses by gender studies researchers and government officials involved in policy making. In this paper, we investigate the fine-grained, multi-label classification of accounts (reports) of sexism. To the best of our knowledge, we work with considerably more categories of sexism than any published work through our 23-class problem formulation. Moreover, we propose a multi-task approach for fine-grained multi-label sexism classification that leverages several supporting tasks without incurring any manual labeling cost. Unlabeled accounts of sexism are utilized through unsupervised learning to help construct our multi-task setup. We also devise objective functions that exploit label correlations in the training data explicitly. Multiple proposed methods outperform the state-of-the-art for multi-label sexism classification on a recently released dataset across five standard metrics.

pdf
SIS@IIITH at SemEval-2020 Task 8: An Overview of Simple Text Classification Methods for Meme Analysis
Sravani Boinepelli | Manish Shrivastava | Vasudeva Varma
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Memes are steadily taking over the feeds of the public on social media. There is always the threat of malicious users on the internet posting offensive content, even through memes. Hence, the automatic detection of offensive images/memes is imperative along with detection of offensive text. However, this is a much more complex task as it involves both visual cues as well as language understanding and cultural/context knowledge. This paper describes our approach to the task of SemEval-2020 Task 8: Memotion Analysis. We chose to participate only in Task A which dealt with Sentiment Classification, which we formulated as a text classification problem. Through our experiments, we explored multiple training models to evaluate the performance of simple text classification algorithms on the raw text obtained after running OCR on meme images. Our submitted model achieved an accuracy of 72.69% and exceeded the existing baseline’s Macro F1 score by 8% on the official test dataset. Apart from describing our official submission, we shall elucidate how different classification models respond to this task.

pdf
Extracting Message Sequence Charts from Hindi Narrative Text
Swapnil Hingmire | Nitin Ramrakhiyani | Avinash Kumar Singh | Sangameshwar Patil | Girish Palshikar | Pushpak Bhattacharyya | Vasudeva Varma
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events

In this paper, we propose the use of Message Sequence Charts (MSC) as a representation for visualizing narrative text in Hindi. An MSC is a formal representation allowing the depiction of actors and interactions among these actors in a scenario, apart from supporting a rich framework for formal inference. We propose an approach to extract MSC actors and interactions from a Hindi narrative. As a part of the approach, we enrich an existing event annotation scheme where we provide guidelines for annotation of the mood of events (realis vs irrealis) and guidelines for annotation of event arguments. We report performance on multiple evaluation criteria by experimenting with Hindi narratives from Indian History. Though Hindi is the fourth most-spoken first language in the world, from the NLP perspective it has comparatively lesser resources than English. Moreover, there is relatively less work in the context of event processing in Hindi. Hence, we believe that this work is among the initial works for Hindi event processing.

pdf
FINSIM20 at the FinSim Task: Making Sense of Text in Financial Domain
Vivek Anand | Yash Agrawal | Aarti Pol | Vasudeva Varma
Proceedings of the Second Workshop on Financial Technology and Natural Language Processing

2019

pdf
FERMI at SemEval-2019 Task 5: Using Sentence embeddings to Identify Hate Speech Against Immigrants and Women in Twitter
Vijayasaradhi Indurthi | Bakhtiyar Syed | Manish Shrivastava | Nikhil Chakravartula | Manish Gupta | Vasudeva Varma
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes our system (Fermi) for Task 5 of SemEval-2019: HatEval: Multilingual Detection of Hate Speech Against Immigrants and Women on Twitter. We participated in the subtask A for English and ranked first in the evaluation on the test set. We evaluate the quality of multiple sentence embeddings and explore multiple training models to evaluate the performance of simple yet effective embedding-ML combination algorithms. Our team - Fermi’s model achieved an accuracy of 65.00% for English language in task A. Our models, which use pretrained Universal Encoder sentence embeddings for transforming the input and SVM (with RBF kernel) for classification, scored first position (among 68) in the leaderboard on the test set for Subtask A in English language. In this paper we provide a detailed description of the approach, as well as the results obtained in the task.

pdf
Fermi at SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media using Sentence Embeddings
Vijayasaradhi Indurthi | Bakhtiyar Syed | Manish Shrivastava | Manish Gupta | Vasudeva Varma
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes our system (Fermi) for Task 6: OffensEval: Identifying and Categorizing Offensive Language in Social Media of SemEval-2019. We participated in all the three sub-tasks within Task 6. We evaluate multiple sentence embeddings in conjunction with various supervised machine learning algorithms and evaluate the performance of simple yet effective embedding-ML combination algorithms. Our team Fermi’s model achieved an F1-score of 64.40%, 62.00% and 62.60% for sub-task A, B and C respectively on the official leaderboard. Our model for sub-task C which uses pre-trained ELMo embeddings for transforming the input and uses SVM (RBF kernel) for training, scored third position on the official leaderboard. Through the paper we provide a detailed description of the approach, as well as the results obtained for the task.

pdf
Fermi at SemEval-2019 Task 8: An elementary but effective approach to Question Discernment in Community QA Forums
Bakhtiyar Syed | Vijayasaradhi Indurthi | Manish Shrivastava | Manish Gupta | Vasudeva Varma
Proceedings of the 13th International Workshop on Semantic Evaluation

Online Community Question Answering Forums (cQA) have gained massive popularity within recent years. The rise in users for such forums have led to the increase in the need for automated evaluation for question comprehension and fact evaluation of the answers provided by various participants in the forum. Our team, Fermi, participated in sub-task A of Task 8 at SemEval 2019 - which tackles the first problem in the pipeline of factual evaluation in cQA forums, i.e., deciding whether a posed question asks for a factual information, an opinion/advice or is just socializing. This information is highly useful in segregating factual questions from non-factual ones which highly helps in organizing the questions into useful categories and trims down the problem space for the next task in the pipeline for fact evaluation among the available answers. Our system uses the embeddings obtained from Universal Sentence Encoder combined with XGBoost for the classification sub-task A. We also evaluate other combinations of embeddings and off-the-shelf machine learning algorithms to demonstrate the efficacy of the various representations and their combinations. Our results across the evaluation test set gave an accuracy of 84% and received the first position in the final standings judged by the organizers.

pdf
Multi-label Categorization of Accounts of Sexism using a Neural Framework
Pulkit Parikh | Harika Abburi | Pinkesh Badjatiya | Radhika Krishnan | Niyati Chhaya | Manish Gupta | Vasudeva Varma
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Sexism, an injustice that subjects women and girls to enormous suffering, manifests in blatant as well as subtle ways. In the wake of growing documentation of experiences of sexism on the web, the automatic categorization of accounts of sexism has the potential to assist social scientists and policy makers in utilizing such data to study and counter sexism better. The existing work on sexism classification, which is different from sexism detection, has certain limitations in terms of the categories of sexism used and/or whether they can co-occur. To the best of our knowledge, this is the first work on the multi-label classification of sexism of any kind(s), and we contribute the largest dataset for sexism categorization. We develop a neural solution for this multi-label classification that can combine sentence representations obtained using models such as BERT with distributional and linguistic word embeddings using a flexible, hierarchical architecture involving recurrent components and optional convolutional ones. Further, we leverage unlabeled accounts of sexism to infuse domain-specific elements into our framework. The best proposed method outperforms several deep learning as well as traditional machine learning baselines by an appreciable margin.

pdf
Extraction of Message Sequence Charts from Software Use-Case Descriptions
Girish Palshikar | Nitin Ramrakhiyani | Sangameshwar Patil | Sachin Pawar | Swapnil Hingmire | Vasudeva Varma | Pushpak Bhattacharyya
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)

Software Requirement Specification documents provide natural language descriptions of the core functional requirements as a set of use-cases. Essentially, each use-case contains a set of actors and sequences of steps describing the interactions among them. Goals of use-case reviews and analyses include their correctness, completeness, detection of ambiguities, prototyping, verification, test case generation and traceability. Message Sequence Chart (MSC) have been proposed as a expressive, rigorous yet intuitive visual representation of use-cases. In this paper, we describe a linguistic knowledge-based approach to extract MSCs from use-cases. Compared to existing techniques, we extract richer constructs of the MSC notation such as timers, conditions and alt-boxes. We apply this tool to extract MSCs from several real-life software use-case descriptions and show that it performs better than the existing techniques. We also discuss the benefits and limitations of the extracted MSCs to meet the above goals.

pdf
Extraction of Message Sequence Charts from Narrative History Text
Girish Palshikar | Sachin Pawar | Sangameshwar Patil | Swapnil Hingmire | Nitin Ramrakhiyani | Harsimran Bedi | Pushpak Bhattacharyya | Vasudeva Varma
Proceedings of the First Workshop on Narrative Understanding

In this paper, we advocate the use of Message Sequence Chart (MSC) as a knowledge representation to capture and visualize multi-actor interactions and their temporal ordering. We propose algorithms to automatically extract an MSC from a history narrative. For a given narrative, we first identify verbs which indicate interactions and then use dependency parsing and Semantic Role Labelling based approaches to identify senders (initiating actors) and receivers (other actors involved) for these interaction verbs. As a final step in MSC extraction, we employ a state-of-the art algorithm to temporally re-order these interactions. Our evaluation on multiple publicly available narratives shows improvements over four baselines.

2018

pdf
ELDEN: Improved Entity Linking Using Densified Knowledge Graphs
Priya Radhakrishnan | Partha Talukdar | Vasudeva Varma
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Entity Linking (EL) systems aim to automatically map mentions of an entity in text to the corresponding entity in a Knowledge Graph (KG). Degree of connectivity of an entity in the KG directly affects an EL system’s ability to correctly link mentions in text to the entity in KG. This causes many EL systems to perform well for entities well connected to other entities in KG, bringing into focus the role of KG density in EL. In this paper, we propose Entity Linking using Densified Knowledge Graphs (ELDEN). ELDEN is an EL system which first densifies the KG with co-occurrence statistics from a large text corpus, and then uses the densified KG to train entity embeddings. Entity similarity measured using these trained entity embeddings result in improved EL. ELDEN outperforms state-of-the-art EL system on benchmark datasets. Due to such densification, ELDEN performs well for sparsely connected entities in the KG too. ELDEN’s approach is simple, yet effective. We have made ELDEN’s code and data publicly available.

pdf
When science journalism meets artificial intelligence : An interactive demonstration
Raghuram Vadapalli | Bakhtiyar Syed | Nishant Prabhu | Balaji Vasan Srinivasan | Vasudeva Varma
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present an online interactive tool that generates titles of blog titles and thus take the first step toward automating science journalism. Science journalism aims to transform jargon-laden scientific articles into a form that the common reader can comprehend while ensuring that the underlying meaning of the article is retained. In this work, we present a tool, which, given the title and abstract of a research paper will generate a blog title by mimicking a human science journalist. The tool makes use of a model trained on a corpus of 87,328 pairs of research papers and their corresponding blogs, built from two science news aggregators. The architecture of the model is a two-stage mechanism which generates blog titles. Evaluation using standard metrics indicate the viability of the proposed system.

pdf
Identification of Alias Links among Participants in Narratives
Sangameshwar Patil | Sachin Pawar | Swapnil Hingmire | Girish Palshikar | Vasudeva Varma | Pushpak Bhattacharyya
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Identification of distinct and independent participants (entities of interest) in a narrative is an important task for many NLP applications. This task becomes challenging because these participants are often referred to using multiple aliases. In this paper, we propose an approach based on linguistic knowledge for identification of aliases mentioned using proper nouns, pronouns or noun phrases with common noun headword. We use Markov Logic Network (MLN) to encode the linguistic knowledge for identification of aliases. We evaluate on four diverse history narratives of varying complexity. Our approach performs better than the state-of-the-art approach as well as a combination of standard named entity recognition and coreference resolution techniques.

pdf
EquGener: A Reasoning Network for Word Problem Solving by Generating Arithmetic Equations
Pruthwik Mishra | Litton J Kurisinkel | Dipti Misra Sharma | Vasudeva Varma
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf
A Workbench for Rapid Generation of Cross-Lingual Summaries
Nisarg Jhaveri | Manish Gupta | Vasudeva Varma
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf
Abstractive Multi-document Summarization by Partial Tree Extraction, Recombination and Linearization
Litton J Kurisinkel | Yue Zhang | Vasudeva Varma
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Existing work for abstractive multidocument summarization utilise existing phrase structures directly extracted from input documents to generate summary sentences. These methods can suffer from lack of consistence and coherence in merging phrases. We introduce a novel approach for abstractive multidocument summarization through partial dependency tree extraction, recombination and linearization. The method entrusts the summarizer to generate its own topically coherent sequential structures from scratch for effective communication. Results on TAC 2011, DUC-2004 and 2005 show that our system gives competitive results compared with state of the art abstractive summarization approaches in the literature. We also achieve competitive results in linguistic quality assessed by human evaluators.

pdf
SSAS: Semantic Similarity for Abstractive Summarization
Raghuram Vadapalli | Litton J Kurisinkel | Manish Gupta | Vasudeva Varma
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Ideally a metric evaluating an abstract system summary should represent the extent to which the system-generated summary approximates the semantic inference conceived by the reader using a human-written reference summary. Most of the previous approaches relied upon word or syntactic sub-sequence overlap to evaluate system-generated summaries. Such metrics cannot evaluate the summary at semantic inference level. Through this work we introduce the metric of Semantic Similarity for Abstractive Summarization (SSAS), which leverages natural language inference and paraphrasing techniques to frame a novel approach to evaluate system summaries at semantic inference level. SSAS is based upon a weighted composition of quantities representing the level of agreement, contradiction, independence, paraphrasing, and optionally ROUGE score between a system-generated and a human-written summary.

pdf
Keynote Lecture 3: Towards Abstractive Summarization
Vasudeva Varma
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

2016

pdf
Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text
Aditya Joshi | Ameya Prabhu | Manish Shrivastava | Vasudeva Varma
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Sentiment analysis (SA) using code-mixed data from social media has several applications in opinion mining ranging from customer satisfaction to social campaign analysis in multilingual societies. Advances in this area are impeded by the lack of a suitable annotated dataset. We introduce a Hindi-English (Hi-En) code-mixed dataset for sentiment analysis and perform empirical analysis comparing the suitability and performance of various state-of-the-art SA methods in social media. In this paper, we introduce learning sub-word level representations in our LSTM (Subword-LSTM) architecture instead of character-level or word-level representations. This linguistic prior in our architecture enables us to learn the information about sentiment value of important morphemes. This also seems to work well in highly noisy text containing misspellings as shown in our experiments which is demonstrated in morpheme-level feature maps learned by our model. Also, we hypothesize that encoding this linguistic prior in the Subword-LSTM architecture leads to the superior performance. Our system attains accuracy 4-5% greater than traditional approaches on our dataset, and also outperforms the available system for sentiment analysis in Hi-En code-mixed text by 18%.

pdf
Non-decreasing Sub-modular Function for Comprehensible Summarization
Litton J Kurisinkel | Pruthwik Mishra | Vigneshwaran Muralidaran | Vasudeva Varma | Dipti Misra Sharma
Proceedings of the NAACL Student Research Workshop

2015

pdf
IIIT-H at SemEval 2015: Twitter Sentiment Analysis – The Good, the Bad and the Neutral!
Ayushi Dalmia | Manish Gupta | Vasudeva Varma
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf
Sentibase: Sentiment Analysis in Twitter on a Budget
Satarupa Guha | Aditya Joshi | Vasudeva Varma
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf
SIEL: Aspect Based Sentiment Analysis in Reviews
Satarupa Guha | Aditya Joshi | Vasudeva Varma
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf
A Sandhi Splitter for Malayalam
Devadath V V | Litton J Kurisinkel | Dipti Misra Sharma | Vasudeva Varma
Proceedings of the 11th International Conference on Natural Language Processing

pdf
Enrichment of Bilingual Dictionary through News Stream Data
Ajay Dubey | Parth Gupta | Vasudeva Varma | Paolo Rosso
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Bilingual dictionaries are the key component of the cross-lingual similarity estimation methods. Usually such dictionary generation is accomplished by manual or automatic means. Automatic generation approaches include to exploit parallel or comparable data to derive dictionary entries. Such approaches require large amount of bilingual data in order to produce good quality dictionary. Many time the language pair does not have large bilingual comparable corpora and in such cases the best automatic dictionary is upper bounded by the quality and coverage of such corpora. In this work we propose a method which exploits continuous quasi-comparable corpora to derive term level associations for enrichment of such limited dictionary. Though we propose our experiments for English and Hindi, our approach can be easily extendable to other languages. We evaluated dictionary by manually computing the precision. In experiments we show our approach is able to derive interesting term level associations across languages.

2013

pdf
sielers : Feature Analysis and Polarity Classification of Expressions from Twitter and SMS Data
Harshit Jain | Aditya Mogadala | Vasudeva Varma
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf
Hindi Subjective Lexicon: A Lexical Resource for Hindi Adjective Polarity Classification
Akshat Bakliwal | Piyush Arora | Vasudeva Varma
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

With recent developments in web technologies, percentage web content in Hindi is growing up at a lighting speed. This information can prove to be very useful for researchers, governments and organization to learn what's on public mind, to make sound decisions. In this paper, we present a graph based wordnet expansion method to generate a full (adjective and adverb) subjective lexicon. We used synonym and antonym relations to expand the initial seed lexicon. We show three different evaluation strategies to validate the lexicon. We achieve 70.4% agreement with human annotators and ∼79% accuracy on product review classification. Main contribution of our work 1) Developing a lexicon of adjectives and adverbs with polarity scores using Hindi Wordnet. 2) Developing an annotated corpora of Hindi Product Reviews.

pdf
Language Independent Sentence-Level Subjectivity Analysis with Feature Selection
Aditya Mogadala | Vasudeva Varma
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation

pdf
Mining Sentiments from Tweets
Akshat Bakliwal | Piyush Arora | Senthil Madhappan | Nikhil Kapre | Mukesh Singh | Vasudeva Varma
Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis

pdf
Language Independent Named Entity Identification using Wikipedia
Mahathi Bhagavatula | Santosh GSK | Vasudeva Varma
Proceedings of the First Workshop on Multilingual Modeling

pdf
Entity Centric Opinion Mining from Blogs
Akshat Bakliwal | Piyush Arora | Vasudeva Varma
Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology

2011

pdf
Domain Independent Model for Product Attribute Extraction from User Reviews using Wikipedia
Sudheer Kovelamudi | Sethu Ramalingam | Arpit Sood | Vasudeva Varma
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf
Language-Independent Context Aware Query Translation using Wikipedia
Rohit Bharadwaj G | Vasudeva Varma
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web

pdf
Towards Enhanced Opinion Classification using NLP Techniques.
Akshat Bakliwal | Piyush Arora | Ankit Patil | Vasudeva Varma
Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011)

2010

pdf
Generating Simulated Relevance Feedback: A Prognostic Search approach
Nithin Kumar | Vasudeva Varma
Coling 2010: Posters

2009

pdf
Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3)
Sivaji Bandyopadhyay | Pushpak Bhattacharyya | Vasudeva Varma | Sudeshna Sarkar | A Kumaran | Raghavendra Udupa
Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3)

pdf
Sentence Position revisited: A robust light-weight Update Summarization ‘baseline’ Algorithm
Rahul Katragadda | Prasad Pingali | Vasudeva Varma
Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3)

pdf
A Language-Independent Transliteration Schema Using Character Aligned Models at NEWS 2009
Praneeth Shishtla | Surya Ganesh V | Sethuramalingam Subramaniam | Vasudeva Varma
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf
Exploiting the Use of Prior Probabilities for Passage Retrieval in Question Answering
Surya Ganesh | Vasudeva Varma
Proceedings of the International Conference RANLP-2009

pdf
Exploiting Structure and Content of Wikipedia for Query Expansion in the Context
Surya Ganesh | Vasudeva Varma
Proceedings of the International Conference RANLP-2009

pdf
Passage Retrieval Using Answer Type Profiles in Question Answering
Surya Ganesh Veeravalli | Vasudeva Varma
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

pdf
Query-Focused Summaries or Query-Biased Summaries?
Rahul Katragadda | Vasudeva Varma
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf
Statistical Machine Translation Models for Personalized Search
Rohini U | Vamshi Ambati | Vasudeva Varma
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf
A Character n-gram Based Approach for Improved Recall in Indian Language NER
Praneeth M Shishtla | Prasad Pingali | Vasudeva Varma
Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages

pdf
Experiments in Telugu NER: A Conditional Random Field Approach
Praneeth M Shishtla | Karthik Gali | Prasad Pingali | Vasudeva Varma
Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages

pdf
Statistical Transliteration for Cross Language Information Retrieval using HMM alignment model and CRF
Prasad Pingali | Surya Ganesh | Sreeharsha Yella | Vasudeva Varma
Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies

pdf
Hindi and Telugu to English CLIR using Query Expansion
Prasad Pingali | Vasudeva Varma
Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies

Search