Jin Yang


Tighter Integration of Rule-Based and Statistical MT in Serial System Combination
Nicola Ueffing | Jens Stephan | Evgeny Matusov | Loïc Dugast | George Foster | Roland Kuhn | Jean Senellart | Jin Yang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)


SYSTRAN’s Chinese Word Segmentation
Jin Yang | Jean Senellart | Remi Zajac
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

SYSTRAN intuitive coding technology
Jean Senellart | Jin Yang | Anabel Rebollo
Proceedings of Machine Translation Summit IX: Papers

Customizing a general-purpose MT system is an effective way to improve machine translation quality for specific usages. Building a user-specific dictionary is the first and most important step in the customization process. An intuitive dictionary-coding tool was developed and is now utilized to allow the user to build user dictionaries easily and intelligently. SYSTRAN’s innovative and proprietary IntuitiveCoding® technology is the engine powering this tool. It is comprised of various components: massive linguistic resources, a morphological analyzer, a statistical guesser, finite-state automaton, and a context-free grammar. Methodologically, IntuitiveCoding® is also a cross-application approach for high quality dictionary building in terminology import and exchange. This paper describes the various components and the issues involved in its implementation. An evaluation frame and utilization of the technology are also presented.

Customizing complex lexical entries for high-quality MT
Rémi Zajac | Elke Lange | Jin Yang
Proceedings of Machine Translation Summit IX: Papers

The customization of Machine Translation systems concentrates, for the most part, on MT dictionaries. In this paper, we focus on the customization of complex lexical entries that involve various types of lexical collocations, such as sub-categorization frames. We describe methods and tools that leverage existing parsers and other MT dictionaries for customization of MT dictionaries. This customization process is applied on large-scale customization of several commercial MT systems, including English to Japanese, Chinese, and Korean.


pdf bib
Pacific Rim portable translator
John Weisgerber | Jin Yang | Pete Fisher
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: System Descriptions

ARL’s FALCon system has proven its integrated OCR and MT technology to be a valuable asset to soldiers in the field in both Bosnia and Haiti. Now it is being extended to include six more SYSTRAN language pairs in response to the military’s need for automatic translation capabilities as they pursue US national objectives in East Asia. The Pacific Rim Portable Translator will provide robust automatic translation bidirectionally for English, Chinese, Japanese, and Korean, which will allow not only rapid assimilation of foreign information, but two-way communication as well for both the public and private sectors.


Towards the automatic acquisition of lexical selection rules
Jin Yang
Proceedings of Machine Translation Summit VII

This paper is a study of a certain type of collocations and implication and application to acquisition of lexical selection rules in transfer-approach MT systems. Collocations reveal the co-occurrence possibilities of linguistic units in one language, which often require lexical selection rules to enhance the natural flow and clarity of MT output. The study presents an automatic acquisition and human verification process to acquire collocations and suggest possible candidates for lexical selection rules. The mechanism has been used in the development and enhancement of the Chinese-English and Japanese-English MT systems, and can be easily adapted to other language pairs. Future work includes expanding its usage to more language pairs and furthering its application to MT customers.

Automatic domain recognition for machine translation
Elke D. Lange | Jin Yang
Proceedings of Machine Translation Summit VII

This paper describes an ongoing project which has the goal of improving machine translation quality by increasing knowledge about the text to be translated. A basic piece of such knowledge is the domain or subject field of the text. When this is known, it is possible to improve meaning selection appropriate to that domain. Our current effort consists in automating both recognition of the text’s domain and the assignment of domain-specific translations. Results of our implementation show that the approach of using terminology categorization already existing in the machine translation system is very promising.


SYSTRAN on AltaVista
Jin Yang | Elke D. Lange
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

On December 9 1997, SYSTRAN and the AltaVista Search Network launched the first widely available, real-time, high-speed and free translation service on the Internet. This initial deployment, treated as a global experiment, has become a tremendous success. Through this service, machine translation (MT) technology has been pushed to the forefront of worldwide awareness. Besides growing media coverage, user response during the first five months has been overwhelming. This paper is a study of the user feedback from the MT developer’s perspective, addressing such questions as: Who are the users? What are their needs? What is their acceptance of MT? What types of texts are being translated? What suggestions do users offer? Finally, this paper outlines our view on opportunities and challenges, and on how to use this feedback to guide future development priorities.


SYSTRAN MT Dictionary Development
Laurie Gerber | Jin Yang
Proceedings of Machine Translation Summit VI: Papers

YSTRAN has demonstrated success in the MT field with its long history spanning nearly 30 years. As a general-purpose fully automatic MT system, SYSTRAN employs a transfer approach. Among its several components, large, carefully encoded, high-quality dictionaries are critical to SYSTRAN's translation capability. A total of over 2.4 million words and expressions are now encoded in the dictionaries for twelve source language systems (30 language pairs - one per year!). SYSTRAN'S dictionaries, along with its parsers, transfer modules, and generators, have been tested on huge amounts of text, and contain large terminology databases covering various domains and detailed linguistic rules. Using these resources, SYSTRAN MT systems have successfully served practical translation needs for nearly 30 years, and built a reputation in the MT world for their large, mature dictionaries. This paper describes various aspects of SYSTRAN MT dictionary development as an important part of the development and refinement of SYSTRAN MT systems. There are 4 major sections: 1) Role and Importance of Dictionaries in the SYSTRAN Paradigm describes the importance of coverage and depth in the dictionaries; 2) Dictionary Structure discusses the specifics of dictionary structure and types of information represented; 3) Dictionary Creation and Update describes the strategy and mechanics of the dictionary development; 4) Past. Present and Future Development provides some perspective on where SYSTRAN has come from and where it is going.