Mansouri Rad


2006

In this paper we describe a set of processes for the acquisition of re­sources for quick ramp­up machine translation (MT) from any language lacking significant machine tracta­ble resources into English, using the Paraguayan indigenous lan­guage Guarani as well as Amharic and Chechen, as examples. Our task is to develop a 250,000 mono­lingual corpus, a 250,000 bilingual parallel corpus, and smaller corpora tagged with part of speech, named entity, and morphological annota­tions.