What’s in a Name? Are BERT Named Entity Representations just as Good for any other Name?
Sriram Balasubramanian
Naman Jain
Gaurav Jindal
Abhijeet Awasthi
Sunita Sarawagi
Proceedings of the 5th Workshop on Representation Learning for NLP
We evaluate named entity representations of BERT-based NLP models by investigating their robustness to replacements from the same typed class in the input. We highlight that on several tasks while such perturbations are natural, state of the art trained models are surprisingly brittle. The brittleness continues even with the recent entity-aware BERT models. We also try to discern the cause of this non-robustness, considering factors such as tokenization and frequency of occurrence. Then we provide a simple method that ensembles predictions from multiple replacements while jointly modeling the uncertainty of type annotations and label predictions. Experiments on three NLP tasks shows that our method enhances robustness and increases accuracy on both natural and adversarial datasets.
A Multi-Dimensional View of Aggression when voicing Opinion
Arjit Srivastava
Avijit Vajpayee
Syed Sarfaraz Akhtar
Naman Jain
Vinay Singh
Manish Shrivastava
Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying
The advent of social media has immensely proliferated the amount of opinions and arguments voiced on the internet. These virtual debates often present cases of aggression. While research has been focused largely on analyzing aggression and stance in isolation from each other, this work is the first attempt to gain an extensive and fine-grained understanding of patterns of aggression and figurative language use when voicing opinion. We present a Hindi-English code-mixed dataset of opinion on the politico-social issue of ‘2016 India banknote demonetisation‘ and annotate it across multiple dimensions such as aggression, hate speech, emotion arousal and figurative language usage (such as sarcasm/irony, metaphors/similes, puns/word-play).
A House United: Bridging the Script and Lexical Barrier between Hindi and Urdu
Riyaz A. Bhat
Irshad A. Bhat
Naman Jain
Dipti Misra Sharma
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
In Computational Linguistics, Hindi and Urdu are not viewed as a monolithic entity and have received separate attention with respect to their text processing. From part-of-speech tagging to machine translation, models are separately trained for both Hindi and Urdu despite the fact that they represent the same language. The reasons mainly are their divergent literary vocabularies and separate orthographies, and probably also their political status and the social perception that they are two separate languages. In this article, we propose a simple but efficient approach to bridge the lexical and orthographic differences between Hindi and Urdu texts. With respect to text processing, addressing the differences between the Hindi and Urdu texts would be beneficial in the following ways: (a) instead of training separate models, their individual resources can be augmented to train single, unified models for better generalization, and (b) their individual text processing applications can be used interchangeably under varied resource conditions. To remove the script barrier, we learn accurate statistical transliteration models which use sentence-level decoding to resolve word ambiguity. Similarly, we learn cross-register word embeddings from the harmonized Hindi and Urdu corpora to nullify their lexical divergences. As a proof of the concept, we evaluate our approach on the Hindi and Urdu dependency parsing under two scenarios: (a) resource sharing, and (b) resource augmentation. We demonstrate that a neural network-based dependency parser trained on augmented, harmonized Hindi and Urdu resources performs significantly better than the parsing models trained separately on the individual resources. We also show that we can achieve near state-of-the-art results when the parsers are used interchangeably.
Language Identification in Code-Switching Scenario
Naman Jain
Riyaz Ahmad Bhat
Proceedings of the First Workshop on Computational Approaches to Code Switching
Adapting Predicate Frames for Urdu PropBanking
Riyaz Ahmad Bhat
Naman Jain
Ashwini Vaidya
Martha Palmer
Tafseer Ahmed Khan
Dipti Misra Sharma
James Babani
Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants
Effective Parsing for Human Aided NLP Systems
Naman Jain
Sambhav Jain
Proceedings of the 13th International Conference on Parsing Technologies (IWPT 2013)
Exploring Semantic Information in Hindi WordNet for Hindi Dependency Parsing
Sambhav Jain
Naman Jain
Aniruddha Tammewar
Riyaz Ahmad Bhat
Dipti Sharma
Proceedings of the Sixth International Joint Conference on Natural Language Processing
Two-stage Approach for Hindi Dependency Parsing Using MaltParser
Naman Jain
Karan Singla
Aniruddha Tammewar
Sambhav Jain
Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages