Jian-Cheng Wu

Also published as: Jian-cheng Wu, Jiancheng Wu

We introduce a method for learning to find domain-specific translations for a given term on the Web. In our approach, the source term is transformed into an expanded query aimed at maximizing the probability of retrieving translations from a very large collection of mixed-code documents. The method involves automatically generating sets of target-language words from training data in specific domains, automatically selecting target words for effectiveness in retrieving documents containing the sought-after translations. At run time, the given term is transformed into an expanded query and submitted to a search engine, and ranked translations are extracted from the document snippets returned by the search engine. We present a prototype, TermMine, which applies the method to a Web search engine. Evaluations over a set of domains and terms show that TermMine outperforms state-of-the-art machine translation systems.

2007

pdf bib

Learning to Find English to Chinese Transliterations on the Web
Jian-Cheng Wu | Jason S. Chang
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2005

pdf bib

Web-Based Unsupervised Learning for Query Formulation in Question Answering
Yi-Chia Wang | Jian-Cheng Wu | Tyne Liang | Jason S. Chang
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib

Learning Source-Target Surface Patterns for Web-based Terminology Translation
Jian-Cheng Wu | Tracy Lin | Jason S. Chang
Proceedings of the ACL Interactive Poster and Demonstration Sessions

2004

pdf bib abs

Extraction of name and transliteration in monolingual and parallel corpora
Tracy Lin | Jian-Cheng Wu | Jason S. Chang
Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers

Named-entities in free text represent a challenge to text analysis in Machine Translation and Cross Language Information Retrieval. These phrases are often transliterated into another language with a different sound inventory and writing system. Named-entities found in free text are often not listed in bilingual dictionaries. Although it is possible to identify and translate named-entities on the fly without a list of proper names and transliterations, an extensive list of existing transliterations certainly will ensure high precision rate. We use a seed list of proper names and transliterations to train a Machine Transliteration Model. With the model it is possible to extract proper names and their transliterations in monolingual or parallel corpora with high precision and recall rates.

pdf bib

Using the Web as Corpus for Un-supervised Learning in Question Answering
Yi-Chia Wang | Jian-Cheng Wu | Tyne Liang | Jason S. Chang
Proceedings of the 16th Conference on Computational Linguistics and Speech Processing

pdf bib

Subsentential Translation Memory for Computer Assisted Writing and Translation
Jian-Cheng Wu | Thomas C. Chuang | Wen-Chi Shei | Jason S. Chang
Proceedings of the ACL Interactive Poster and Demonstration Sessions