Jugal Kalita

Also published as: J.K. Kalita, Jugal K. Kalita


2020

pdf bib
Generalization to Mitigate Synonym Substitution Attacks
Basemah Alshemali | Jugal Kalita
Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

Studies have shown that deep neural networks (DNNs) are vulnerable to adversarial examples – perturbed inputs that cause DNN-based models to produce incorrect results. One robust adversarial attack in the NLP domain is the synonym substitution. In attacks of this variety, the adversary substitutes words with synonyms. Since synonym substitution perturbations aim to satisfy all lexical, grammatical, and semantic constraints, they are difficult to detect with automatic syntax check as well as by humans. In this paper, we propose a structure-free defensive method that is capable of improving the performance of DNN-based models with both clean and adversarial data. Our findings show that replacing the embeddings of the important words in the input samples with the average of their synonyms’ embeddings can significantly improve the generalization of DNN-based classifiers. By doing so, we reduce model sensitivity to particular words in the input samples. Our results indicate that the proposed defense is not only capable of defending against adversarial attacks, but is also capable of improving the performance of DNN-based models when tested on benign data. On average, the proposed defense improved the classification accuracy of the CNN and Bi-LSTM models by 41.30% and 55.66%, respectively, when tested under adversarial attacks. Extended investigation shows that our defensive method can improve the robustness of nonneural models, achieving an average of 17.62% and 22.93% classification accuracy increase on the SVM and XGBoost models, respectively. The proposed defensive method has also shown an average of 26.60% classification accuracy improvement when tested with the infamous BERT model. Our algorithm is generic enough to be applied in any NLP domain and to any model trained on any natural language.

pdf bib
Solving Arithmetic Word Problems Using Transformer and Pre-processing of Problem Texts
Kaden Griffith | Jugal Kalita
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

This paper outlines the use of Transformer networks trained to translate math word problems to equivalent arithmetic expressions in infix, prefix, and postfix notations. We compare results produced by a large number of neural configurations and find that most configurations outperform previously reported approaches on three of four datasets with significant increases in accuracy of over 20 percentage points. The best neural approaches boost accuracy by 30% on average when compared to the previous state-of-the-art.

pdf bib
Language Model Metrics and Procrustes Analysis for Improved Vector Transformation of NLP Embeddings
Thomas Conley | Jugal Kalita
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

Artificial Neural networks are mathematical models at their core. This truism presents some fundamental difficulty when networks are tasked with Natural Language Processing. A key problem lies in measuring the similarity or distance among vectors in NLP embedding space, since the mathematical concept of distance does not always agree with the linguistic concept. We suggest that the best way to measure linguistic distance among vectors is by employing the Language Model (LM) that created them. We introduce Language Model Distance (LMD) for measuring accuracy of vector transformations based on the Distributional Hypothesis ( LMD Accuracy ). We show the efficacy of this metric by applying it to a simple neural network learning the Procrustes algorithm for bilingual word mapping.

2019

pdf bib
Introducing Aspects of Creativity in Automatic Poetry Generation
Brendan Bena | Jugal Kalita
Proceedings of the 16th International Conference on Natural Language Processing

Poetry Generation involves teaching systems to automatically generate text that resembles poetic work. A deep learning system can learn to generate poetry on its own by training on a corpus of poems and modeling the particular style of language. In this paper, we propose taking an approach that fine-tunes GPT-2, a pre-trained language model, to our downstream task of poetry generation. We extend prior work on poetry generation by introducing creative elements. Specifically, we generate poems that express emotion and elicit the same in readers, and poems that use the language of dreams—called dream poetry. We are able to produce poems that correctly elicit the emotions of sadness and joy 87.5 and 85 percent, respectively, of the time. We produce dreamlike poetry by training on a corpus of texts that describe dreams. Poems from this model are shown to capture elements of dream poetry with scores of no less than 3.2 on the Likert scale. We perform crowdsourced human-evaluation for all our poems. We also make use of the Coh-Metrix tool, outlining metrics we use to gauge the quality of text generated.

2018

pdf bib
Genre Identification and the Compositional Effect of Genre in Literature
Joseph Worsham | Jugal Kalita
Proceedings of the 27th International Conference on Computational Linguistics

Recent advances in Natural Language Processing are finding ways to place an emphasis on the hierarchical nature of text instead of representing language as a flat sequence or unordered collection of words or letters. A human reader must capture multiple levels of abstraction and meaning in order to formulate an understanding of a document. In this paper, we address the problem of developing approaches which are capable of working with extremely large and complex literary documents to perform Genre Identification. The task is to assign the literary classification to a full-length book belonging to a corpus of literature, where the works on average are well over 200,000 words long and genre is an abstract thematic concept. We introduce the Gutenberg Dataset for Genre Identification. Additionally, we present a study on how current deep learning models compare to traditional methods for this task. The results are presented as a baseline along with findings on how using an ensemble of chapters can significantly improve results in deep learning methods. The motivation behind the ensemble of chapters method is discussed as the compositionality of subtexts which make up a larger work and contribute to the overall genre.

pdf bib
Isolated and Ensemble Audio Preprocessing Methods for Detecting Adversarial Examples against Automatic Speech Recognition
Krishan Rajaratnam | Kunal Shah | Jugal Kalita
Proceedings of the 30th Conference on Computational Linguistics and Speech Processing (ROCLING 2018)

2017

pdf bib
Neural Networks for Semantic Textual Similarity
Derek Prijatelj | Jugal Kalita | Jonathan Ventura
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

pdf bib
Open Set Text Classification Using CNNs
Sridhama Prakhya | Vinodini Venkataram | Jugal Kalita
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

2016

pdf bib
Enhancing Automatic Wordnet Construction Using Word Embeddings
Feras Al Tarouti | Jugal Kalita
Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP

pdf bib
Integrating WordNet for Multiple Sense Embeddings in Vector Semantics
David Foley | Jugal Kalita
Proceedings of the 13th International Conference on Natural Language Processing

pdf bib
Composition of Compound Nouns Using Distributional Semantics
Kyra Yee | Jugal Kalita
Proceedings of the 13th International Conference on Natural Language Processing

2015

pdf bib
Phrase translation using a bilingual dictionary and n-gram data: A case study from Vietnamese to English
Khang Nhut Lam | Feras Al Tarouti | Jugal Kalita
Proceedings of the 11th Workshop on Multiword Expressions

2014

pdf bib
Creating Lexical Resources for Endangered Languages
Khang Nhut Lam | Feras Al Tarouti | Jugal Kalita
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages

pdf bib
Automatically constructing Wordnet Synsets
Khang Nhut Lam | Feras Al Tarouti | Jugal Kalita
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Better Twitter Summaries?
Joel Judd | Jugal Kalita
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Creating Reverse Bilingual Dictionaries
Khang Nhut Lam | Jugal Kalita
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
Summarization of Historical Articles Using Temporal Event Clustering
James Gung | Jugal Kalita
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Multi-objective Optimization for Efficient Brahmic Keyboards
Albert Brouillette | Devraj Sarmah | Jugal Kalita
Proceedings of the Second Workshop on Advances in Text Input Methods

2010

pdf bib
Summarizing Microblogs Automatically
Beaux Sharifi | Mark-Anthony Hutton | Jugal Kalita
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
Part of Speech Tagger for Assamese Text
Navanath Saharia | Dhrubajyoti Das | Utpal Sharma | Jugal Kalita
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2002

pdf bib
Unsupervised Learning of Morphology for Building Lexicon for a Highly Inflectional Language
Utpal Sharma | Jugal Kalita | Rajib Das
Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning

1988

pdf bib
Automatically Generating Natural Language Reports in an Office Environment
Jugal Kalita | Sunil Shende
Second Conference on Applied Natural Language Processing

1986

pdf bib
Summarizing Natural Language Database Responses
Jugal K. Kalita | Marlene L. Jones | Gordon I. McCalla
Computational Linguistics. Formerly the American Journal of Computational Linguistics Volume 12, Number 2 April-June 1986

1984

pdf bib
A Response to the Need for Summary Responses
J.K. Kalita | M.J. Colbourn | G.I. McCalla
10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics