Hemant Darbari


2020

pdf
TREE ADJOINING GRAMMAR BASEDLANGUAGE INDEPENDENT GENERATOR
Pavan Kurariya | Prashant Chaudhary | Jahnavi Bodhankar | Lenali Singh | Ajai Kumar | Hemant Darbari
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

This paper proposes language independent natural language generator for Tree Adjoining Grammar (TAG)[8] based Machine Translation System. In this model, the TAG based parsing and generation approach considered for the syntactic and semantic analysis of a source language. This model provides an efficient and a systematic way of encapsulating language resources with engineering solution to develop the machine translation System. A TAG based Generator is developed with existing resources using TAG formalism to generate the target language from TAG based parser derivation. The process allows syntactic feature-marking, the Subject-Predicate Agreement marking and multiple synthesized generated outputs in complex and morphological rich language. The challenge in applying such approach is to handle the linguistically diversified features. It is achieved using rule-based translation grammar model to align the source language to corresponding target languages. The computational experiments demonstrate that substantial performance in terms of time and memory could also be obtained by using this approach. Nevertheless, this paper also describes the process of lexicalization and explain the state charts, TAG based adjunction and substitution function and the complexity and challenges beneath parsing-generation process.

1999

pdf
Computer assisted translation system – an Indian perspective
Hemant Darbari
Proceedings of Machine Translation Summit VII

Work in the area of Machine Translation has been going on for several decades and it was only during the early 90s that a promising translation technology began to emerge with advanced researches in the field of Artificial Intelligence and Computational Linguistics. This held the promise of successfully developing usable Machine Translation Systems in certain well-defined domains. C-DAC took up this challenge, as we felt that India, being a multi-lingual and multi-cultural country with a population of approximately 950 million people and 18 constitutionally recognized languages, needs a translation system for instant transfer of information and knowledge. The other groups who are working in this area of English to Hindi Translation are National Center for Software Technology (NCST), who are working on translation of News Stories and Electronics Research & Development Center of India (ER & DCI). who have developed the Machine Assisted Translation System for the Health Domain. A major project on Indian Languages to Indian Languages Translation (Anusaaraka) is also under development at University of Hyderabad.