Nishant Kambhatla


2023

pdf
Learning Nearest Neighbour Informed Latent Word Embeddings to Improve Zero-Shot Machine Translation
Nishant Kambhatla | Logan Born | Anoop Sarkar
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

Multilingual neural translation models exploit cross-lingual transfer to perform zero-shot translation between unseen language pairs. Past efforts to improve cross-lingual transfer have focused on aligning contextual sentence-level representations. This paper introduces three novel contributions to allow exploiting nearest neighbours at the token level during training, including: (i) an efficient, gradient-friendly way to share representations between neighboring tokens; (ii) an attentional semantic layer which extracts latent features from shared embeddings; and (iii) an agreement loss to harmonize predictions across different sentence representations. Experiments on two multilingual datasets demonstrate consistent gains in zero shot translation over strong baselines.

pdf
Language Model Based Target Token Importance Rescaling for Simultaneous Neural Machine Translation
Aditi Jain | Nishant Kambhatla | Anoop Sarkar
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

The decoder in simultaneous neural machine translation receives limited information from the source while having to balance the opposing requirements of latency versus translation quality. In this paper, we use an auxiliary target-side language model to augment the training of the decoder model. Under this notion of target adaptive training, generating rare or difficult tokens is rewarded which improves the translation quality while reducing latency. The predictions made by a language model in the decoder are combined with the traditional cross entropy loss which frees up the focus on the source side context. Our experimental results over multiple language pairs show that compared to previous state of the art methods in simultaneous translation, we can use an augmented target side context to improve BLEU scores significantly. We show improvements over the state of the art in the low latency range with lower average lagging values (faster output).

pdf
Decipherment as Regression: Solving Historical Substitution Ciphers by Learning Symbol Recurrence Relations
Nishant Kambhatla | Logan Born | Anoop Sarkar
Findings of the Association for Computational Linguistics: EACL 2023

Solving substitution ciphers involves mapping sequences of cipher symbols to fluent text in a target language. This has conventionally been formulated as a search problem, to find the decipherment key using a character-level language model to constrain the search space. This work instead frames decipherment as a sequence prediction task, using a Transformer-based causal language model to learn recurrences between characters in a ciphertext. We introduce a novel technique for transcribing arbitrary substitution ciphers into a common recurrence encoding. By leveraging this technique, we (i) create a large synthetic dataset of homophonic ciphers using random keys, and (ii) train a decipherment model that predicts the plaintext sequence given a recurrence-encoded ciphertext. Our method achieves strong results on synthetic 1:1 and homophonic ciphers, and cracks several real historic homophonic ciphers. Our analysis shows that the model learns recurrence relations between cipher symbols and recovers decipherment keys in its self-attention.

2022

pdf
CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation
Nishant Kambhatla | Logan Born | Anoop Sarkar
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose a novel data-augmentation technique for neural machine translation based on ROT-k ciphertexts. ROT-k is a simple letter substitution cipher that replaces a letter in the plaintext with the kth letter after it in the alphabet. We first generate multiple ROT-k ciphertexts using different values of k for the plaintext which is the source side of the parallel data. We then leverage this enciphered training data along with the original parallel data via multi-source training to improve neural machine translation. Our method, CipherDAug, uses a co-regularization-inspired training procedure, requires no external data sources other than the original training data, and uses a standard Transformer to outperform strong data augmentation techniques on several datasets by a significant margin. This technique combines easily with existing approaches to data augmentation, and yields particularly strong results in low-resource settings.

pdf
Auxiliary Subword Segmentations as Related Languages for Low Resource Multilingual Translation
Nishant Kambhatla | Logan Born | Anoop Sarkar
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

We propose a novel technique that combines alternative subword tokenizations of a single source-target language pair that allows us to leverage multilingual neural translation training methods. These alternate segmentations function like related languages in multilingual translation. Overall this improves translation accuracy for low-resource languages and produces translations that are lexically diverse and morphologically rich. We also introduce a cross-teaching technique which yields further improvements in translation accuracy and cross-lingual transfer between high- and low-resource language pairs. Compared to other strong multilingual baselines, our approach yields average gains of +1.7 BLEU across the four low-resource datasets from the multilingual TED-talks dataset. Our technique does not require additional training data and is a drop-in improvement for any existing neural translation system.

2021

pdf
Measuring and Improving Faithfulness of Attention in Neural Machine Translation
Pooya Moradi | Nishant Kambhatla | Anoop Sarkar
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

While the attention heatmaps produced by neural machine translation (NMT) models seem insightful, there is little evidence that they reflect a model’s true internal reasoning. We provide a measure of faithfulness for NMT based on a variety of stress tests where attention weights which are crucial for prediction are perturbed and the model should alter its predictions if the learned weights are a faithful explanation of the predictions. We show that our proposed faithfulness measure for NMT models can be improved using a novel differentiable objective that rewards faithful behaviour by the model through probability divergence. Our experimental results on multiple language pairs show that our objective function is effective in increasing faithfulness and can lead to a useful analysis of NMT model behaviour and more trustworthy attention heatmaps. Our proposed objective improves faithfulness without reducing the translation quality and has a useful regularization effect on the NMT model and can even improve translation quality in some cases.

2020

pdf
Training with Adversaries to Improve Faithfulness of Attention in Neural Machine Translation
Pooya Moradi | Nishant Kambhatla | Anoop Sarkar
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop

Can we trust that the attention heatmaps produced by a neural machine translation (NMT) model reflect its true internal reasoning? We isolate and examine in detail the notion of faithfulness in NMT models. We provide a measure of faithfulness for NMT based on a variety of stress tests where model parameters are perturbed and measuring faithfulness based on how often the model output changes. We show that our proposed faithfulness measure for NMT models can be improved using a novel differentiable objective that rewards faithful behaviour by the model through probability divergence. Our experimental results on multiple language pairs show that our objective function is effective in increasing faithfulness and can lead to a useful analysis of NMT model behaviour and more trustworthy attention heatmaps. Our proposed objective improves faithfulness without reducing the translation quality and it also seems to have a useful regularization effect on the NMT model and can even improve translation quality in some cases.

2019

pdf
Sign Clustering and Topic Extraction in Proto-Elamite
Logan Born | Kate Kelley | Nishant Kambhatla | Carolyn Chen | Anoop Sarkar
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

We describe a first attempt at using techniques from computational linguistics to analyze the undeciphered proto-Elamite script. Using hierarchical clustering, n-gram frequencies, and LDA topic models, we both replicate results obtained by manual decipherment and reveal previously-unobserved relationships between signs. This demonstrates the utility of these techniques as an aid to manual decipherment.

pdf
Interrogating the Explanatory Power of Attention in Neural Machine Translation
Pooya Moradi | Nishant Kambhatla | Anoop Sarkar
Proceedings of the 3rd Workshop on Neural Generation and Translation

Attention models have become a crucial component in neural machine translation (NMT). They are often implicitly or explicitly used to justify the model’s decision in generating a specific token but it has not yet been rigorously established to what extent attention is a reliable source of information in NMT. To evaluate the explanatory power of attention for NMT, we examine the possibility of yielding the same prediction but with counterfactual attention models that modify crucial aspects of the trained attention model. Using these counterfactual attention mechanisms we assess the extent to which they still preserve the generation of function and content words in the translation process. Compared to a state of the art attention model, our counterfactual attention models produce 68% of function words and 21% of content words in our German-English dataset. Our experiments demonstrate that attention models by themselves cannot reliably explain the decisions made by a NMT model.

2018

pdf
Decipherment for Adversarial Offensive Language Detection
Zhelun Wu | Nishant Kambhatla | Anoop Sarkar
Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)

Automated filters are commonly used by online services to stop users from sending age-inappropriate, bullying messages, or asking others to expose personal information. Previous work has focused on rules or classifiers to detect and filter offensive messages, but these are vulnerable to cleverly disguised plaintext and unseen expressions especially in an adversarial setting where the users can repeatedly try to bypass the filter. In this paper, we model the disguised messages as if they are produced by encrypting the original message using an invented cipher. We apply automatic decipherment techniques to decode the disguised malicious text, which can be then filtered using rules or classifiers. We provide experimental results on three different datasets and show that decipherment is an effective tool for this task.

pdf
Decipherment of Substitution Ciphers with Neural Language Models
Nishant Kambhatla | Anahita Mansouri Bigvand | Anoop Sarkar
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Decipherment of homophonic substitution ciphers using language models is a well-studied task in NLP. Previous work in this topic scores short local spans of possible plaintext decipherments using n-gram language models. The most widely used technique is the use of beam search with n-gram language models proposed by Nuhn et al.(2013). We propose a beam search algorithm that scores the entire candidate plaintext at each step of the decipherment using a neural language model. We augment beam search with a novel rest cost estimation that exploits the prediction power of a neural language model. We compare against the state of the art n-gram based methods on many different decipherment tasks. On challenging ciphers such as the Beale cipher we provide significantly better error rates with much smaller beam sizes.