Manoj Chinnakotla

Also published as: Manoj K. Chinnakotla, Manoj Kumar Chinnakotla

2018

Code-Mixing (CM) is the phenomenon of alternating between two or more languages which is prevalent in bi- and multi-lingual communities. Most NLP applications today are still designed with the assumption of a single interaction language and are most likely to break given a CM utterance with multiple languages mixed at a morphological, phrase or sentence level. For example, popular commercial search engines do not yet fully understand the intents expressed in CM queries. As a first step towards fostering research which supports CM in NLP applications, we systematically crowd-sourced and curated an evaluation dataset for factoid question answering in three CM languages - Hinglish (Hindi+English), Tenglish (Telugu+English) and Tamlish (Tamil+English) which belong to two language families (Indo-Aryan and Dravidian). We share the details of our data collection process, techniques which were used to avoid inducing lexical bias amongst the crowd workers and other CM specific linguistic properties of the dataset. Our final dataset, which is available freely for research purposes, has 1,694 Hinglish, 2,848 Tamlish and 1,391 Tenglish factoid questions and their answers. We discuss the techniques used by the participants for the first edition of this ongoing challenge.

pdf abs
Transliteration Better than Translation? Answering Code-mixed Questions over a Knowledge Base
Vishal Gupta | Manoj Chinnakotla | Manish Shrivastava
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching

Humans can learn multiple languages. If they know a fact in one language, they can answer a question in another language they understand. They can also answer Code-mix (CM) questions: questions which contain both languages. This behavior is attributed to the unique learning ability of humans. Our task aims to study if machines can achieve this. We demonstrate how effectively a machine can answer CM questions. In this work, we adopt a two phase approach: candidate generation and candidate re-ranking to answer questions. We propose a Triplet-Siamese-Hybrid CNN (TSHCNN) to re-rank candidate answers. We show experiments on the SimpleQuestions dataset. Our network is trained only on English questions provided in this dataset and noisy Hindi translations of these questions and can answer English-Hindi CM questions effectively without the need of translation into English. Back-transliterated CM questions outperform their lexical and sentence level translated counterparts by 5% & 35% in accuracy respectively, highlighting the efficacy of our approach in a resource constrained setting.

pdf abs
Retrieve and Re-rank: A Simple and Effective IR Approach to Simple Question Answering over Knowledge Graphs
Vishal Gupta | Manoj Chinnakotla | Manish Shrivastava
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

SimpleQuestions is a commonly used benchmark for single-factoid question answering (QA) over Knowledge Graphs (KG). Existing QA systems rely on various components to solve different sub-tasks of the problem (such as entity detection, entity linking, relation prediction and evidence integration). In this work, we propose a different approach to the problem and present an information retrieval style solution for it. We adopt a two-phase approach: candidate generation and candidate re-ranking to answer questions. We propose a Triplet-Siamese-Hybrid CNN (TSHCNN) to re-rank candidate answers. Our approach achieves an accuracy of 80% which sets a new state-of-the-art on the SimpleQuestions dataset.

pdf abs
Automatic Spelling Correction for Resource-Scarce Languages using Deep Learning
Pravallika Etoori | Manoj Chinnakotla | Radhika Mamidi
Proceedings of ACL 2018, Student Research Workshop

Spelling correction is a well-known task in Natural Language Processing (NLP). Automatic spelling correction is important for many NLP applications like web search engines, text summarization, sentiment analysis etc. Most approaches use parallel data of noisy and correct word mappings from different sources as training data for automatic spelling correction. Indic languages are resource-scarce and do not have such parallel data due to low volume of queries and non-existence of such prior implementations. In this paper, we show how to build an automatic spelling corrector for resource-scarce languages. We propose a sequence-to-sequence deep learning model which trains end-to-end. We perform experiments on synthetic datasets created for Indic languages, Hindi and Telugu, by incorporating the spelling mistakes committed at character level. A comparative evaluation shows that our model is competitive with the existing spell checking and correction techniques for Indic languages.

2016

pdf abs
Hand in Glove: Deep Feature Fusion Network Architectures for Answer Quality Prediction in Community Question Answering
Sai Praneeth Suggu | Kushwanth Naga Goutham | Manoj K. Chinnakotla | Manish Shrivastava
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Community Question Answering (cQA) forums have become a popular medium for soliciting direct answers to specific questions of users from experts or other experienced users on a given topic. However, for a given question, users sometimes have to sift through a large number of low-quality or irrelevant answers to find out the answer which satisfies their information need. To alleviate this, the problem of Answer Quality Prediction (AQP) aims to predict the quality of an answer posted in response to a forum question. Current AQP systems either learn models using - a) various hand-crafted features (HCF) or b) Deep Learning (DL) techniques which automatically learn the required feature representations. In this paper, we propose a novel approach for AQP known as - “Deep Feature Fusion Network (DFFN)” which combines the advantages of both hand-crafted features and deep learning based systems. Given a question-answer pair along with its metadata, the DFFN architecture independently - a) learns features from the Deep Neural Network (DNN) and b) computes hand-crafted features using various external resources and then combines them using a fully connected neural network trained to predict the final answer quality. DFFN is end-end differentiable and trained as a single system. We propose two different DFFN architectures which vary mainly in the way they model the input question/answer pair - DFFN-CNN uses a Convolutional Neural Network (CNN) and DFFN-BLNA uses a Bi-directional LSTM with Neural Attention (BLNA). Both these proposed variants of DFFN (DFFN-CNN and DFFN-BLNA) achieve state-of-the-art performance on the standard SemEval-2015 and SemEval-2016 benchmark datasets and outperforms baseline approaches which individually employ either HCF or DL based techniques alone.

pdf
Together we stand: Siamese Networks for Similar Question Retrieval
Arpita Das | Harish Yenala | Manoj Chinnakotla | Manish Shrivastava
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Co-authors

Venues

acl5
ijcnlp1
news1
emnlp1
ws1
show all...

icon1

coling1