Naman Jain


A Multi-Dimensional View of Aggression when voicing Opinion
Arjit Srivastava | Avijit Vajpayee | Syed Sarfaraz Akhtar | Naman Jain | Vinay Singh | Manish Shrivastava
Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying

The advent of social media has immensely proliferated the amount of opinions and arguments voiced on the internet. These virtual debates often present cases of aggression. While research has been focused largely on analyzing aggression and stance in isolation from each other, this work is the first attempt to gain an extensive and fine-grained understanding of patterns of aggression and figurative language use when voicing opinion. We present a Hindi-English code-mixed dataset of opinion on the politico-social issue of ‘2016 India banknote demonetisation‘ and annotate it across multiple dimensions such as aggression, hate speech, emotion arousal and figurative language usage (such as sarcasm/irony, metaphors/similes, puns/word-play).

What’s in a Name? Are BERT Named Entity Representations just as Good for any other Name?
Sriram Balasubramanian | Naman Jain | Gaurav Jindal | Abhijeet Awasthi | Sunita Sarawagi
Proceedings of the 5th Workshop on Representation Learning for NLP

We evaluate named entity representations of BERT-based NLP models by investigating their robustness to replacements from the same typed class in the input. We highlight that on several tasks while such perturbations are natural, state of the art trained models are surprisingly brittle. The brittleness continues even with the recent entity-aware BERT models. We also try to discern the cause of this non-robustness, considering factors such as tokenization and frequency of occurrence. Then we provide a simple method that ensembles predictions from multiple replacements while jointly modeling the uncertainty of type annotations and label predictions. Experiments on three NLP tasks shows that our method enhances robustness and increases accuracy on both natural and adversarial datasets.


A House United: Bridging the Script and Lexical Barrier between Hindi and Urdu
Riyaz A. Bhat | Irshad A. Bhat | Naman Jain | Dipti Misra Sharma
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In Computational Linguistics, Hindi and Urdu are not viewed as a monolithic entity and have received separate attention with respect to their text processing. From part-of-speech tagging to machine translation, models are separately trained for both Hindi and Urdu despite the fact that they represent the same language. The reasons mainly are their divergent literary vocabularies and separate orthographies, and probably also their political status and the social perception that they are two separate languages. In this article, we propose a simple but efficient approach to bridge the lexical and orthographic differences between Hindi and Urdu texts. With respect to text processing, addressing the differences between the Hindi and Urdu texts would be beneficial in the following ways: (a) instead of training separate models, their individual resources can be augmented to train single, unified models for better generalization, and (b) their individual text processing applications can be used interchangeably under varied resource conditions. To remove the script barrier, we learn accurate statistical transliteration models which use sentence-level decoding to resolve word ambiguity. Similarly, we learn cross-register word embeddings from the harmonized Hindi and Urdu corpora to nullify their lexical divergences. As a proof of the concept, we evaluate our approach on the Hindi and Urdu dependency parsing under two scenarios: (a) resource sharing, and (b) resource augmentation. We demonstrate that a neural network-based dependency parser trained on augmented, harmonized Hindi and Urdu resources performs significantly better than the parsing models trained separately on the individual resources. We also show that we can achieve near state-of-the-art results when the parsers are used interchangeably.


Language Identification in Code-Switching Scenario
Naman Jain | Riyaz Ahmad Bhat
Proceedings of the First Workshop on Computational Approaches to Code Switching

Adapting Predicate Frames for Urdu PropBanking
Riyaz Ahmad Bhat | Naman Jain | Ashwini Vaidya | Martha Palmer | Tafseer Ahmed Khan | Dipti Misra Sharma | James Babani
Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants


Effective Parsing for Human Aided NLP Systems
Naman Jain | Sambhav Jain
Proceedings of the 13th International Conference on Parsing Technologies (IWPT 2013)

Exploring Semantic Information in Hindi WordNet for Hindi Dependency Parsing
Sambhav Jain | Naman Jain | Aniruddha Tammewar | Riyaz Ahmad Bhat | Dipti Sharma
Proceedings of the Sixth International Joint Conference on Natural Language Processing


Two-stage Approach for Hindi Dependency Parsing Using MaltParser
Naman Jain | Karan Singla | Aniruddha Tammewar | Sambhav Jain
Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages