Siddhant Garg

2021

pdf bib abs
Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering
Siddhant Garg | Alessandro Moschitti
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In this paper we propose a novel approach towards improving the efficiency of Question Answering (QA) systems by filtering out questions that will not be answered by them. This is based on an interesting new finding: the answer confidence scores of state-of-the-art QA systems can be approximated well by models solely using the input question text. This enables preemptive filtering of questions that are not answered by the system due to their answer confidence scores being lower than the system threshold. Specifically, we learn Transformer-based question models by distilling Transformer-based answering models. Our experiments on three popular QA datasets and one industrial QA benchmark demonstrate the ability of our question models to approximate the Precision/Recall curves of the target QA system well. These question models, when used as filters, can effectively trade off lower computation cost of QA systems for lower Recall, e.g., reducing computation by ~60%, while only losing ~3-4% of Recall.

2020

pdf bib abs
Beyond Fine-tuning: Few-Sample Sentence Embedding Transfer
Siddhant Garg | Rohit Kumar Sharma | Yingyu Liang
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Fine-tuning (FT) pre-trained sentence embedding models on small datasets has been shown to have limitations. In this paper we show that concatenating the embeddings from the pre-trained model with those from a simple sentence embedding model trained only on the target data, can improve over the performance of FT for few-sample tasks. To this end, a linear classifier is trained on the combined embeddings, either by freezing the embedding model weights or training the classifier and embedding models end-to-end. We perform evaluation on seven small datasets from NLP tasks and show that our approach with end-to-end training outperforms FT with negligible computational overhead. Further, we also show that sophisticated combination techniques like CCA and KCCA do not work as well in practice as concatenation. We provide theoretical analysis to explain this empirical observation.

pdf bib abs
BAE: BERT-based Adversarial Examples for Text Classification
Siddhant Garg | Goutham Ramakrishnan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Modern text classification models are susceptible to adversarial examples, perturbed versions of the original text indiscernible by humans which get misclassified by the model. Recent works in NLP use rule-based synonym replacement strategies to generate adversarial examples. These strategies can lead to out-of-context and unnaturally complex token replacements, which are easily identifiable by humans. We present BAE, a black box attack for generating adversarial examples using contextual perturbations from a BERT masked language model. BAE replaces and inserts tokens in the original text by masking a portion of the text and leveraging the BERT-MLM to generate alternatives for the masked tokens. Through automatic and human evaluations, we show that BAE performs a stronger attack, in addition to generating adversarial examples with improved grammaticality and semantic coherence as compared to prior work.

2018

pdf bib abs
Surprisingly Easy Hard-Attention for Sequence to Sequence Learning
Shiv Shankar | Siddhant Garg | Sunita Sarawagi
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper we show that a simple beam approximation of the joint distribution between attention and output is an easy, accurate, and efficient attention mechanism for sequence to sequence learning. The method combines the advantage of sharp focus in hard attention and the implementation ease of soft attention. On five translation tasks we show effortless and consistent gains in BLEU compared to existing attention mechanisms.

Co-authors

Alessandro Moschitti 1

Venues

EMNLP3
AACL1