Shourya Roy


“Are you calling for the vaporizer you ordered?” Combining Search and Prediction to Identify Orders in Contact Centers
Abinaya K | Shourya Roy
Proceedings of the 4th Workshop on e-Commerce and NLP

With the growing footprint of ecommerce worldwide, the role of contact center is becoming increasingly crucial for customer satisfaction. To effectively handle scale and manage operational cost, automation through chat-bots and voice-bots are getting rapidly adopted. With customers having multiple, often long list of active orders - the first task of a voice-bot is to identify which one they are calling about. Towards solving this problem which we refer to as order identification, we propose a two-staged real-time technique by combining search and prediction in a sequential manner. In the first stage, analogous to retrieval-based question-answering, a fuzzy search technique uses customized textual similarity measures on noisy transcripts of calls to retrieve the order of interest. The coverage of fuzzy search is limited by no or limited response from customers to voice prompts. Hence, in the second stage, a predictive solution that predict the most likely order a customer is calling about based on certain features of orders is introduced. We compare with multiple relevant techniques based on word embeddings as well as ecommerce product search to show that the proposed approach provides the best performance with 64% coverage and 87% accuracy on a large real-life data-set. A system based on the proposed technique is also deployed in production for a fraction of calls landing in the contact center of a large ecommerce provider; providing real evidence of operational benefits as well as increased customer delight.


Learning Transferable Feature Representations Using Neural Networks
Himanshu Sharad Bhatt | Shourya Roy | Arun Rajkumar | Sriranjani Ramakrishnan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Learning representations such that the source and target distributions appear as similar as possible has benefited transfer learning tasks across several applications. Generally it requires labeled data from the source and only unlabeled data from the target to learn such representations. While these representations act like a bridge to transfer knowledge learned in the source to the target; they may lead to negative transfer when the source specific characteristics detract their ability to represent the target data. We present a novel neural network architecture to simultaneously learn a two-part representation which is based on the principle of segregating source specific representation from the common representation. The first part captures the source specific characteristics while the second part captures the truly common representation. Our architecture optimizes an objective function which acts adversarial for the source specific part if it contributes towards the cross-domain learning. We empirically show that two parts of the representation, in different arrangements, outperforms existing learning algorithms on the source learning as well as cross-domain tasks on multiple datasets.


SODA:Service Oriented Domain Adaptation Architecture for Microblog Categorization
Himanshu Sharad Bhatt | Sandipan Dandapat | Peddamuthu Balaji | Shourya Roy | Sharmistha Jat | Deepali Semwal
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

Distributed Vector Representations for Unsupervised Automatic Short Answer Grading
Oliver Adams | Shourya Roy | Raghuram Krishnapuram
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

We address the problem of automatic short answer grading, evaluating a collection of approaches inspired by recent advances in distributional text representations. In addition, we propose an unsupervised approach for determining text similarity using one-to-many alignment of word vectors. We evaluate the proposed technique across two datasets from different domains, namely, computer science and English reading comprehension, that additionally vary between highschool level and undergraduate students. Experiments demonstrate that the proposed technique often outperforms other compositional distributional semantics approaches as well as vector space methods such as latent semantic analysis. When combined with a scoring scheme, the proposed technique provides a powerful tool for tackling the complex problem of short answer grading. We also discuss a number of other key points worthy of consideration in preparing viable, easy-to-deploy automatic short-answer grading systems for the real-world.

A Fluctuation Smoothing Approach for Unsupervised Automatic Short Answer Grading
Shourya Roy | Sandipan Dandapat | Y. Narahari
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

We offer a fluctuation smoothing computational approach for unsupervised automatic short answer grading (ASAG) techniques in the educational ecosystem. A major drawback of the existing techniques is the significant effect that variations in model answers could have on their performances. The proposed fluctuation smoothing approach, based on classical sequential pattern mining, exploits lexical overlap in students’ answers to any typical question. We empirically demonstrate using multiple datasets that the proposed approach improves the overall performance and significantly reduces (up to 63%) variation in performance (standard deviation) of unsupervised ASAG techniques. We bring in additional benchmarks such as (a) paraphrasing of model answers and (b) using answers by k top performing students as model answers, to amplify the benefits of the proposed approach.

Wisdom of Students: A Consistent Automatic Short Answer Grading Technique
Shourya Roy | Sandipan Dandapat | Ajay Nagesh | Y. Narahari
Proceedings of the 13th International Conference on Natural Language Processing

Cross-domain Text Classification with Multiple Domains and Disparate Label Sets
Himanshu Sharad Bhatt | Manjira Sinha | Shourya Roy
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)


An Iterative Similarity based Adaptation Technique for Cross-domain Text Classification
Himanshu Sharad Bhatt | Deepali Semwal | Shourya Roy
Proceedings of the Nineteenth Conference on Computational Natural Language Learning

Feature Selection for Short Text Classification using Wavelet Packet Transform
Anuj Mahajan | Sharmistha Jat | Shourya Roy
Proceedings of the Nineteenth Conference on Computational Natural Language Learning


TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain
Anoop Kunchukuttan | Rajen Chatterjee | Shourya Roy | Abhijit Mishra | Pushpak Bhattacharyya
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations


Experiences in Resource Generation for Machine Translation through Crowdsourcing
Anoop Kunchukuttan | Shourya Roy | Pratik Patel | Kushal Ladha | Somya Gupta | Mitesh M. Khapra | Pushpak Bhattacharyya
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The logistics of collecting resources for Machine Translation (MT) has always been a cause of concern for some of the resource deprived languages of the world. The recent advent of crowdsourcing platforms provides an opportunity to explore the large scale generation of resources for MT. However, before venturing into this mode of resource collection, it is important to understand the various factors such as, task design, crowd motivation, quality control, etc. which can influence the success of such a crowd sourcing venture. In this paper, we present our experiences based on a series of experiments performed. This is an attempt to provide a holistic view of the different facets of translation crowd sourcing and identifying key challenges which need to be addressed for building a practical crowdsourcing solution for MT.


Automatic Identification of Important Segments and Expressions for Mining of Business-Oriented Conversations at Contact Centers
Hironori Takeuchi | L Venkata Subramaniam | Tetsuya Nasukawa | Shourya Roy
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)


Automatic Generation of Domain Models for Call-Centers from Noisy Transcriptions
Shourya Roy | L Venkata Subramaniam
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics