Harshit Surana


2008

pdf
Estimating the Resource Adaption Cost from a Resource Rich Language to a Similar Resource Poor Language
Anil Kumar Singh | Kiran Pala | Harshit Surana
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Developing resources which can be used for Natural Language Processing is an extremely difficult task for any language, but is even more so for less privileged (or less computerized) languages. One way to overcome this difficulty is to adapt the resources of a linguistically close resource rich language. In this paper we discuss how the cost of such adaption can be estimated using subjective and objective measures of linguistic similarity for allocating financial resources, time, manpower etc. Since this is the first work of its kind, the method described in this paper should be seen as only a preliminary method, indicative of how better methods can be developed. Corpora of several less computerized languages had to be collected for the work described in the paper, which was difficult because for many of these varieties there is not much electronic data available. Even if it is, it is in non-standard encodings, which means that we had to build encoding converters for these varieties. The varieties we have focused on are some of the varieties spoken in the South Asian region.

pdf
A More Discerning and Adaptable Multilingual Transliteration Mechanism for Indian Languages
Harshit Surana | Anil Kumar Singh
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf
Aggregating Machine Learning and Rule Based Heuristics for Named Entity Recognition
Karthik Gali | Harshit Surana | Ashwini Vaidya | Praneeth Shishtla | Dipti Misra Sharma
Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages

2007

pdf
Can Corpus Based Measures be Used for Comparative Study of Languages?
Anil Kumar Singh | Harshit Surana
Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology