Jian-Cheng Wu

Also published as: Jian-cheng Wu, Jiancheng Wu


2016

2015

2014

2013

2012

2010

2009

2008

We introduce a method for learning to find domain-specific translations for a given term on the Web. In our approach, the source term is transformed into an expanded query aimed at maximizing the probability of retrieving translations from a very large collection of mixed-code documents. The method involves automatically generating sets of target-language words from training data in specific domains, automatically selecting target words for effectiveness in retrieving documents containing the sought-after translations. At run time, the given term is transformed into an expanded query and submitted to a search engine, and ranked translations are extracted from the document snippets returned by the search engine. We present a prototype, TermMine, which applies the method to a Web search engine. Evaluations over a set of domains and terms show that TermMine outperforms state-of-the-art machine translation systems.

2007

2005

2004

Named-entities in free text represent a challenge to text analysis in Machine Translation and Cross Language Information Retrieval. These phrases are often transliterated into another language with a different sound inventory and writing system. Named-entities found in free text are often not listed in bilingual dictionaries. Although it is possible to identify and translate named-entities on the fly without a list of proper names and transliterations, an extensive list of existing transliterations certainly will ensure high precision rate. We use a seed list of proper names and transliterations to train a Machine Transliteration Model. With the model it is possible to extract proper names and their transliterations in monolingual or parallel corpora with high precision and recall rates.

2003