Sanket Shah


2019

pdf
Unsung Challenges of Building and Deploying Language Technologies for Low Resource Language Communities
Pratik Joshi | Christain Barnes | Sebastin Santy | Simran Khanuja | Sanket Shah | Anirudh Srinivasan | Satwik Bhattamishra | Sunayana Sitaram | Monojit Choudhury | Kalika Bali
Proceedings of the 16th International Conference on Natural Language Processing

In this paper, we examine and analyze the challenges associated with developing and introducing language technologies to low-resource language communities. While doing so we bring to light the successes and failures of past work in this area, challenges being faced in doing so, and what have they achieved. Throughout this paper, we take a problem-facing approach and describe essential factors which the success of such technologies hinges upon. We present the various aspects in a manner which clarify and lay out the different tasks involved, which can aid organizations looking to make an impact in this area. We take the example of Gondi, an extremely-low resource Indian language, to reinforce and complement our discussion.

pdf
CoSSAT: Code-Switched Speech Annotation Tool
Sanket Shah | Pratik Joshi | Sebastin Santy | Sunayana Sitaram
Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP

Code-switching refers to the alternation of two or more languages in a conversation or utterance and is common in multilingual communities across the world. Building code-switched speech and natural language processing systems are challenging due to the lack of annotated speech and text data. We present a speech annotation interface CoSSAT, which helps annotators transcribe code-switched speech faster, more easily and more accurately than a traditional interface, by displaying candidate words from monolingual speech recognizers. We conduct a user study on the transcription of Hindi-English code-switched speech with 10 annotators and describe quantitative and qualitative results.