Shikhar Kr. Sarma

Also published as: Shikhar Kr Sarma, Shikhar Sarma, Shikhar Sharma


Assamese Word Sense Disambiguation using Genetic Algorithm
Arjun Gogoi | Nomi Baruah | Shikhar Kr. Sarma
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

Word sense disambiguation (WSD) is a problem to determine a word according to a context in which it occurs. There are plenty amount of works done in WSD for some languages such as English, but research work on Assamese WSD remains limited. It is a more exigent task because Assamese has an intrinsic complexity in its writing structure and ambiguity, such as syntactic, semantic, and anaphoric ambiguity levels.A novel unsupervised genetic word sense disambiguation algorithm is proposed in this paper. The algorithm first uses WordNet to extract all possible senses for a given ambiguous word, then a genetic algorithm is used taking Wu-Palmer’s similarity measure as the fitness function and calculating the similarity measure for all extracted senses. The winner sense which will have the highest score declared as he winner sense.


Spoken WordNet
Kishore Kashyap | Shikhar Kr Sarma | Kumari Sweta
Proceedings of the 10th Global Wordnet Conference

WordNets have been used in a wide variety of applications, including in design and development of intelligent and human assisting systems. Although WordNet was initially developed as an online lexical database, (Miller, 1995 and Fellbaum, 1998) later developments have inspired using WordNet database as resources in NLP applications, Language Technology developments, and as sources of structured learned materials. This paper proposes, conceptualizes, designs, and develops a voice enabled information retrieval system, facilitating WordNet knowledge presentation in a spoken format, based on a spoken query. In practice, the work converts the WordNet resource into a structured voiced based knowledge extraction system, where a spoken query is processed in a pipeline, and then extracting the relevant WordNet resources, structuring through another process pipeline, and then presented in spoken format. Thus the system facilitates a speech interface to the existing WordNet and we named the system as “Spoken WordNet”. The system interacts with two interfaces, one designed and developed for Web, and the other as an App interface for smartphone. This is also a kind of restructuring the WordNet as a friendly version for visually challenged users. User can input query string in the form of spoken English sentence or word. Jaccard Similarity is calculated between the input sentence and the synset definitions. The one with highest similarity score is taken as the synset of interest among multiple available synsets. User is also prompted to choose a contextual synset, in case of ambiguities.


A Frame Tracking Model for Memory-Enhanced Dialogue Systems
Hannes Schulz | Jeremie Zumer | Layla El Asri | Shikhar Sharma
Proceedings of the 2nd Workshop on Representation Learning for NLP

Recently, resources and tasks were proposed to go beyond state tracking in dialogue systems. An example is the frame tracking task, which requires recording multiple frames, one for each user goal set during the dialogue. This allows a user, for instance, to compare items corresponding to different goals. This paper proposes a model which takes as input the list of frames created so far during the dialogue, the current user utterance as well as the dialogue acts, slot types, and slot values associated with this utterance. The model then outputs the frame being referenced by each triple of dialogue act, slot type, and slot value. We show that on the recently published Frames dataset, this model significantly outperforms a previously proposed rule-based baseline. In addition, we propose an extensive analysis of the frame tracking task by dividing it into sub-tasks and assessing their difficulty with respect to our model.

Frames: a corpus for adding memory to goal-oriented dialogue systems
Layla El Asri | Hannes Schulz | Shikhar Sharma | Jeremie Zumer | Justin Harris | Emery Fine | Rahul Mehrotra | Kaheer Suleman
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

This paper proposes a new dataset, Frames, composed of 1369 human-human dialogues with an average of 15 turns per dialogue. This corpus contains goal-oriented dialogues between users who are given some constraints to book a trip and assistants who search a database to find appropriate trips. The users exhibit complex decision-making behaviour which involve comparing trips, exploring different options, and selecting among the trips that were discussed during the dialogue. To drive research on dialogue systems towards handling such behaviour, we have annotated and released the dataset and we propose in this paper a task called frame tracking. This task consists of keeping track of different semantic frames throughout each dialogue. We propose a rule-based baseline and analyse the frame tracking task through this baseline.


A Quantitative Analysis of Synset of Assamese WordNet: Its Position and Timeline
Shikhar Sarma | Dibyajyoti Sarmah | Ratul Deka | Anup Barman | Jumi Sarmah | Himadri Bharali | Mayashree Mahanta | Umesh Deka
Proceedings of the Seventh Global Wordnet Conference

An Analytical Study of Synonymy in Assamese Language Using WorldNet: Classification and Structure
Himadri Bharali | Mayashree Mahanta | Shikhar Kr. Sarma | Utpal Saikia | Dibyajyoti Sarmah
Proceedings of the Seventh Global Wordnet Conference

Assamese WordNet based Quality Enhancement of Bilingual Machine Translation System
Anup Barman | Jumi Sarmah | Shikhar Sarma
Proceedings of the Seventh Global Wordnet Conference


Building Multilingual Lexical Resources using Wordnets: Structure, Design and Implementation
Shikhar Kr. Sarma | Dibyajyoti Sarmah | Biswajit Brahma | Himadri Bharali | Mayashree Mahanta | Utpal Saikia
Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon

A Structured Approach for Building Assamese Corpus: Insights, Applications and Challenges
Shikhar Kr. Sarma | Himadri Bharali | Ambeswar Gogoi | Ratul Deka | Anup Kr. Barman
Proceedings of the 10th Workshop on Asian Language Resources

Corpus Building of Literary Lesser Rich Language-Bodo: Insights and Challenges
Biswajit Brahma | Anup Kr. Barman | Shikhar Kr. Sarma | Bhatima Boro
Proceedings of the 10th Workshop on Asian Language Resources

Structured and Logical Representations of Assamese Text for Question-Answering System
Shikhar Kr. Sarma | Rita Chakraborty
Proceedings of the Workshop on Question Answering for Complex Domains