Anupam Basu

In this paper we have developed an open-source online computational framework that can be used by different research groups to conduct reading researches on Indian language texts. The framework can be used to develop a large annotated Indian language text comprehension data from different user based experiments. The novelty in this framework lies in the fact that it brings different empirical data-collection techniques for text comprehension under one roof. The framework has been customized specifically to address language particularities for Indian languages. It will also offer many types of automatic analysis on the data at different levels such as full text, sentence and word level. To address the subjectivity of text difficulty perception, the framework allows to capture user background against multiple factors. The assimilated data can be automatically cross referenced against varying strata of readers.

2012

pdf
Forward Transliteration of Dzongkha Text to Braille
Tirthankar Dasgupta | Manjira Sinha | Anupam Basu
Proceedings of the Second Workshop on Advances in Text Input Methods

pdf
Automatic Extraction of Compound Verbs from Bangla Corpora
Sibanshu Mukhopadhayay | Tirthankar Dasgupta | Manjira Sinha | Anupam Basu
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing

pdf
A New Semantic Lexicon and Similarity Measure in Bangla
Manjira Sinha | Abhik Jana | Tirthankar Dasgupta | Anupam Basu
Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon

pdf
A Hybrid Dependency Parser for Bangla
Arnab Dhar | Sanjay Chatterji | Sudeshna Sarkar | Anupam Basu
Proceedings of the 10th Workshop on Asian Language Resources

pdf
Translations of Ambiguous Hindi Pronouns to Possible Bengali Pronouns
Sanjay Chatterji | Sudeshna Sarkar | Anupam Basu
Proceedings of the 10th Workshop on Asian Language Resources

pdf
A Three Stage Hybrid Parser for Hindi
Sanjay Chatterji | Arnad Dhar | Sudeshna Sarkar | Anupam Basu
Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages

pdf
Modelling the Organization and Processing of Bangla Polymorphemic Words in the Mental Lexicon: A Computational Approach
Tirthankar Dasgupta | Manjira Sinha | Anupam Basu
Proceedings of COLING 2012: Posters

pdf
New Readability Measures for Bangla and Hindi Texts
Manjira Sinha | Sakshi Sharma | Tirthankar Dasgupta | Anupam Basu
Proceedings of COLING 2012: Posters

2010

pdf abs
Determining Reliability of Subjective and Multi-label Emotion Annotation through Novel Fuzzy Agreement Measure
Plaban Kr. Bhowmick | Anupam Basu | Pabitra Mitra
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The paper presents a new fuzzy agreement measure $\gamma_f$ for determining the agreement in multi-label and subjective annotation task. In this annotation framework, one data item may belong to a category or a class with a belief value denoting the degree of confidence of an annotator in assigning the data item to that category. We have provided a notion of disagreement based on the belief values provided by the annotators with respect to a category. The fuzzy agreement measure $\gamma_f$ has been proposed by defining different fuzzy agreement sets based on the distribution of difference of belief values provided by the annotators. The fuzzy agreement has been computed by studying the average agreement over all the data items and annotators. Finally, we elaborate on the computation $\gamma_f$ measure with a case study on emotion text data where a data item (sentence) may belong to more than one emotion category with varying belief values.

pdf abs
Resource Creation for Training and Testing of Transliteration Systems for Indian Languages
Sowmya V. B. | Monojit Choudhury | Kalika Bali | Tirthankar Dasgupta | Anupam Basu
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Machine transliteration is used in a number of NLP applications ranging from machine translation and information retrieval to input mechanisms for non-roman scripts. Many popular Input Method Editors for Indian languages, like Baraha, Akshara, Quillpad etc, use back-transliteration as a mechanism to allow users to input text in a number of Indian language. The lack of a standard dataset to evaluate these systems makes it difficult to make any meaningful comparisons of their relative accuracies. In this paper, we describe the methodology for the creation of a dataset of ~2500 transliterated sentence pairs each in Bangla, Hindi and Telugu. The data was collected across three different modes from a total of 60 users. We believe that this dataset will prove useful not only for the evaluation and training of back-transliteration systems but also help in the linguistic analysis of the process of transliterating Indian languages from native scripts to Roman.