Prajwol Shrestha

Also published as: Prajol Shrestha


2016

pdf bib
Codeswitching Detection via Lexical Features in Conditional Random Fields
Prajwol Shrestha
Proceedings of the Second Workshop on Computational Approaches to Code Switching

2014

pdf bib
Incremental N-gram Approach for Language Identification in Code-Switched Text
Prajwol Shrestha
Proceedings of the First Workshop on Computational Approaches to Code Switching

2012

pdf bib
Participation du LINA à DEFT2012 (LINA at DEFT2012) [in French]
Florian Boudin | Amir Hazem | Nicolas Hernandez | Prajol Shrestha
JEP-TALN-RECITAL 2012, Workshop DEFT 2012: DÉfi Fouille de Textes (DEFT 2012 Workshop: Text Mining Challenge)

2011

pdf bib
Reduction of Search Space to Annotate Monolingual Corpora
Prajol Shrestha | Christine Jacquin | Beatrice Daille
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Alignment of Monolingual Corpus by Reduction of the Search Space
Prajol Shrestha
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues

Monolingual comparable corpora annotated with alignments between text segments (paragraphs, sentences, etc.) based on similarity are used in a wide range of natural language processing applications like plagiarism detection, information retrieval, summarization and so on. The drawback wanting to use them is that there aren’t many standard corpora which are aligned. Due to this drawback, the corpus is manually created, which is a time consuming and costly task. In this paper, we propose a method to significantly reduce the search space for manual alignment of the monolingual comparable corpus which in turn makes the alignment process faster and easier. This method can be used in making alignments on different levels of text segments. Using this method we create our own gold corpus aligned on the level of paragraph, which will be used for testing and building our algorithms for automatic alignment. We also present some experiments for the reduction of search space on the basis of stem overlap, word overlap, and cosine similarity measure which help us automatize the process to some extent and reduce human effort for alignment.

pdf bib
Corpus-Based methods for Short Text Similarity
Prajol Shrestha
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues (articles courts)

This paper presents corpus-based methods to find similarity between short text (sentences, paragraphs, ...) which has many applications in the field of NLP. Previous works on this problem have been based on supervised methods or have used external resources such as WordNet, British National Corpus etc. Our methods are focused on unsupervised corpus-based methods. We present a new method, based on Vector Space Model, to capture the contextual behavior, senses and correlation, of terms and show that this method performs better than the baseline method that uses vector based cosine similarity measure. The performance of existing document similarity measures, Dice and Resemblance, are also evaluated which in our knowledge have not been used for short text similarity. We also show that the performance of the vector-based baseline method is improved when using stems instead of words and using the candidate sentences for computing the parameters rather than some external resource.