A gentle introduction to MT: theory and current practice
This tutorial provides a nontechnical introduction to machine translation. It reviews the whole scope of MT, outlining briefly its history and the major application areas today, and describing the various kinds of MT techniques that have been invented—from direct replacement through transfer to the holy grail of interlinguas. It briefly outlines the newest statistics-based techniques and provides an introduction to the difficult questions of MT evaluation. Topics include: History and development of MT; Theoretical foundations of MT; Traditional and modern MT techniques; Newest MT research; Thorny questions of evaluating MT systems
How to make MT work for you
A successful MT operation is the result of good planning, an incremental approach, dedicated and talented people, and MT software that is capable of handling the types of text to be translated. The presenter will share insights gained during 20 years of experience with the development and implementation of machine translation at the Pan American Health Organization and the experiences of users of several MT systems. Topics include: evaluating candidate systems, making a commitment, preparing the environment, training, choosing input texts, postediting, building the dictionaries, requesting program enhancements, monitoring progress, and justifying the investment. The MT software available today can increase your productivity if you are willing to put in some initial effort to learn the system and tailor it to your needs. Then MT will start working for you!
MT evaluation: old, new, and recycled
The tutorial addresses the issues peculiar to machine translation evaluation, namely the difficulty in determining what constitutes correct translation, and which types of evaluation are the most meaningful for evaluation "consumers." The tutorial is structured around evaluation methods designed for particular purposes: types of MT design, stages in the development lifecycle, and intended end-use of a system that includes MT. It will provide an overview of the issues and classic approaches to MT evaluation. The traditional processes, such as those outlined in the ALPAC report, will be examined for their value historically and in terms of today's environments. The tutorial also provides an insight into the latest evaluation techniques, designed to capture the value of MT systems in the context of current and future automated text handling processes.
The tutorial will introduce purpose-dependent levels of translation. The levels will be defined and it will be shown how to correct errors in relation to those levels. Error collection procedures as well as file storage and naming procedures will be discussed. Examples will be discussed that show how to read MT output for a particular level of translation. The implications of a purpose-oriented interpretation on client relations will also be introduced, and helpful hints on how to deal with the new parameters will be given. No previous knowledge of machine translation is required for successful participation in this session.
U.S. Government Support and Use of Machine Translation: Current Status
Thomas R. Pedtke
The United States Government has filled a key role in the development and application of Machine Translation technology for over four decades. A recent study by the White House Office of Science and Technology has reaffirmed the importance of this role. Two key world events, the emergence of Internet technology and the collapse of the former Soviet Union, have stimulated rapid changes in the status of Machine Translation requirements and applications. A continuing need for Machine Translation systems in the United States military along with the application of Machine Translation systems on key United States Government networks has made Machine Translation systems available to tens of thousands of users. Advances in automating textual information processes and in testing and evaluation of the technology has further stimulated Machine Translation development and applications. Although budget reductions will impact this continuing growth, renewed cooperation will ameliorate some of the impact and the emerging widespread use of Machine Translation could reverse the budget trends. Age old arguments between linguists and Machine Translation advocates seem to be giving way to recognition of mutual dependence and the potential for Win/Win outcomes. The past five years have witnessed an accelerated exposure and application of Machine Translation technology in the United States Government unequaled in its 40 year history. However, with some budgetary adjustments, the next five years could be truly phenomenal. Advocates for Machine Translation technology and its applications are poised to meet the 21st Century and the Information Age with renewed vigor and practical applications which promise to end the debate over Machine Translation's viability forever.
First steps in Mechanical Translation
Although the first ideas for mechanical translation were made in the seventeenth century, it was not until this century that means became available for realization with the appearance of the electronic computer in the mid 1940s. Fifty years ago, in March 1947 Warren Weaver wrote to Norbert Wiener and met Andrew Booth, mentioning to both the use of computers for translation. The possibilities were investigated during the next seven years, until in January 1954 the first prototype program was demonstrated. This article is a brief chronicle of these early years of mechanizing translation processes.
The Origins of MT
Andrew D. Booth
Kathleen H. V. Booth
Something Old and Something New
Victor H. Yngve
The Fulcrum Approach to Machine Translation
Christine A. Montgomery
Let Me tell You How It Really Was
My First 30 Years with MT
MT at Texas: The Early Years
The Challenge of Keypunching for MT
Roger A. Heller
The Emergence of MT in Europe
Machine Translation Through Language Understanding
In this paper is described a general framework of a next generation machine translation system which translates a text not sentence by sentence but by considering inter-sentential discourse. The method is a step closer to human translation than the present-day machine translation systems. Particularly important are a detailed discourse analysis and a flexible text generation by using information obtained from the discourse analysis.
The Current State of Machine Translation
Harold L. Somers
This paper aims to survey the current state of research, development and use of Machine Translation (MT). Under ‘research’ the role of linguistics is discussed, and contrasted with research in ‘analogy- based’ MT. The range of languages covered by MT systems is discussed, and the lack of development for minority languages noted. The new research area of spoken language translation (SLT) is reviewed, with some major differences between SLT and text MT described. Under ‘use and users’ we discuss tools for users: Translation Memory, bilingual concordances and software to help checking for mistranslations. The use of MT on the World Wide Web is also discussed, regarding pre- and post-editing, the impact of ‘controlled language’ is reviewed, and finally a proposal is made that MT users can revise the input text in the light of errors that the system makes, thus ‘post-editing the source text’.
MT started out as a ‘technology push’: more than 50 years ago, researchers had the bright idea of doing translation with the use of the newly developed computers. MT remained in the technology push area for many years. However, in the nineties we are seeing the ‘market pull’ beginning to play a role and there are good reasons to believe that this trend will continue. MT is going where the market and the users wants it to go, and MT will be prospering in the future. MT will be available electronically over the network, and MT will be available in environments which also offer a variety of other tools for translation, as well as tools for other types of information management. Also in research and in development of new technologies, MT will further develop, e.g. along the lines of knowledge-based MT, advanced integration of different analysis techniques (rule-based, statistics-based, etc.), integration with speech etc.
Machine Translation of Interactive Texts
A Real-Time MT System for Translating Broadcast Captions
This presentation demonstrates a new multi-engine machine translation system, which combines knowledge-based and example-based machine translation strategies for real-time translation of business news captions from English to German.
Managing Distributed MT Projects Today — A New Challenge
Jennifer A. Brundage
The current trend towards globalization means that even the most modern of industries must constantly re-evaluate its strategies and adapt to new technologies. As a long-time supporter of MT technology, SAP has shown that it can make productive use of competitive, commercial MT products along with other CAT products. In making MT work for them, however, SAP has also had to substantially adapt the products that they received from MT companies. The result, after many years, is a full range of peripheral tools and workflow scenarios that support the use of their MT programs.
MT Research and Development (R&D) in Europe
Exchange Interfaces for Translation Tools
The following paper presents an overview of current discussions of exchange interfaces in the area of multilingual processing. It first discusses the principles which are relevant for the definition of such interfaces; it then presents a state of the art and a proposal in the area of text interfaces, translation memory interfaces, and terminology exchange. The approach is bottom-up, i.e. it starts from existing interfaces and existing requirements, and intends to be of practical use. It reflects the discussions in current multilingual research projects of the EC, like OTELO and AVENTINUS.
R&D for Commercial MT
MT research in the commercial environment tends to be conservative, and to introduce change gradually, both because of limited funds, and the need to quickly turn innovations into product features. However, there are a number of challenges and opportunities that could make commercial research a much more dynamic environment for advancement of the field as a whole.
MT from an Everyday User’s Point of View
This paper discusses the experiences of the specialised Danish translation company Lingtech in its use of MT for the translation of technical texts. The background and motivation for setting up Lingtech as an MT-based company is outlined. After a short general presentation of the PaTrans MT-system, the different tasks we have to perform in relation to our use of MT and the way this work is organized in order to achieve maximum cost-efficiency are described. This leads on to the discussion of problem areas for the everyday user in terms of ergonomy and tools for what may be called 'peripheral' tasks, e.g. pre- and post-editing texts, and dictionary maintenance. In the course of gaining experience in running an MT-based organization, we have identified crucial areas, where even relatively simple tools can have quite an impact on the overall productivity and profitability of using MT. Given the state-of-the-art within language technology many useful tools can now be made for the MT-user; however, we argue that too little attention has been given to these aspects so far and that they may indeed be critical to the commercial success of machine translation.
Translating Scientific Texts using MT and MAT Ssytems: Practical Experience of a Professional Translator
The paper describes practical experience of a professional translator. The task consisted in translating 400 pages of Russian scientific materials (covering all fundamental sciences) into English within a month. The job was fulfilled using three computer-based systems: PARS, a Russian-English bidirectional machine translation system by Lingvistica '93 Co., Polyglossum, dictionary-support software by ETS Ltd., and the Random House electronic dictionary of the English language. The paper analyzes the pluses and minuses of translating scientific texts using computer programs, and gives numerous examples of translations. The main conclusion is that machine translation has no reasonable alternative when a large volume of scientific texts is to be translated professionally within a short period of time.
User-Friendly Machine Translation: Alternate Translations Based on Differing Beliefs
In this paper the authors present a notion of “user-friendly” translation and describe a method for achieving it within a pragmatics-based approach to machine translation. The approach relies on modeling the beliefs of the participants in the translation process: the source language speaker and addressee, the translator and the target language addressee. Translation choices may vary according to how beliefs are ascribed to the various participants and, in particular, “user-friendly” choices are based on the beliefs ascribed to the TL addressee.
Sharable Formats and Their Supporting Environments for Exchanging User Dictionaries among Different MT Systems as a Part of AAMT Activities
We, machine translation providers, as members of Asia-Pacific Association for Machine Translation (AAMT), are now establishing environments for sharing and exchanging user dictionaries among different machine translation systems. In order for users to utilize machine translation systems more effectively, we define common formats of user dictionaries, and establish electronic environments available for users to exchange their user dictionaries using these common formats. This task started in 1996, and the formats will be fixed in March of 1998.
JEIDA’s Bilingual Corpus and Other Corpora for NLP Research in Japan
The committee on text processing technology of JEIDA (Japan Electronics Industry Development Association) has been developing its bilingual corpus for research on machine translation systems since the 1996 Japanese fiscal year. An overview of this bilingual corpus is presented in this paper. And other linguistic data recently developed in Japan, which includes the RWC text database and the simple sentence data by the CRL and IPA.
Multi-Lingual Spoken Dialog Translation System Using Transfer-Driven Machine Translation
This paper describes a Transfer-Driven Machine Translation (TDMT) system as a prototype for efficient multi-lingual spoken-dialog translation. Currently, the TDMT system deals with dialogues in the travel domain, such as travel scheduling, hotel reservation, and trouble-shooting, and covers almost all expressions presented in commercially-available travel conversation guides. In addition, to put a speech dialog translation system into practical use, it is necessary to develop a mechanism that can handle the speech recognition errors. In TDMT, robust translation can be achieved by using an example-based correct parts extraction (CPE) technique to translate the plausible parts from speech recognition results even if the results have several recognition errors. We have applied TDMT to three language pairs, i.e., Japanese-English, Japanese-Korean, Japanese-German. Simulations of dialog communication between different language speakers can be provided via a TCP/IP network. In our performance evaluation for the translation of TDMT utilizing 69-87 unseen dialogs, we achieved about 70% acceptability in the JE, KJ translations, almost 60% acceptability in the EJ and JG translations, and about 90% acceptability in the JK translations. In the case of handling erroneous sentences caused by speech recognition errors, although almost all translation results end up as unacceptable translation in conventional methods, 69% of the speech translation results are improved by the CPE technique.
MT R&D in Asia
There is a big shift in MT R&D in this region after many large-scale projects conducted in the past ten years. Multi-lingual Machine Translation (MMT) project is one of the significant R&D projects that increased a great number of NLP related researchers and research activities which can be seen in the increasing number of the research institutes in the recent years. We learned a lot from the collaboration research across languages and we still hope that it will be a rigorous step for the future MT R&D in this region. Though the MT systems are still far from the extreme goal of the perfect translation, it can be observed that the MT systems are actually used to support information retrieval from the Internet.
Corpus-Based Statistics-Oriented (CBSO) Machine Translation Researches in Taiwan
A brief introduction to the MT research projects in Taiwan is given in this paper. Special attention is given to the more and more popular corpus-based statistics-oriented (CBSO) approaches in MT researches. In particular, the parameterized two-way training philosophy in designing the second generation BehaviorTran, which is the first and the largest operational system in this area, is introduced in this paper.
An Example of MT Use by the U.S.Government
PARS/U for Windows: The World’s First Commercial English-Ukrainian and Ukrainian-English Machine Translation System
Michael S. Blekhman
The paper describes the PARS/U Ukrainian-English bidirectional MT system by Lingvistica '93 Co. PARS/U translates MS Word and HTML files as well as screen Helps. It features an easy-to-master dictionary updating program, which permits the user to customize the system by means of running subject-area oriented texts through the MT engine. PARS/U is marketed in Ukraine and North America.
From METAL to T1: Systems and Components for Machine Translation Applications
This paper describes the progress which has been made to make MT systems usable in professional environments. After many years of significant investment, it was decided that the time was ripe for the METAL machine translation system to be better positioned in the market place. Two lines of action were followed: Introducing the system onto the PC market, using the GMS-T1 as a concrete example; Reusing system components in customized solutions, using the AVENTINUS project as an example, which is a multilingual information processing application. Both lines of action have far-reaching consequences for system development. But they also create new opportunities to improve the system's capabilities and flexibility.
MT R&D in Canada
SYSTRAN MT Dictionary Development
YSTRAN has demonstrated success in the MT field with its long history spanning nearly 30 years. As a general-purpose fully automatic MT system, SYSTRAN employs a transfer approach. Among its several components, large, carefully encoded, high-quality dictionaries are critical to SYSTRAN's translation capability. A total of over 2.4 million words and expressions are now encoded in the dictionaries for twelve source language systems (30 language pairs - one per year!). SYSTRAN'S dictionaries, along with its parsers, transfer modules, and generators, have been tested on huge amounts of text, and contain large terminology databases covering various domains and detailed linguistic rules. Using these resources, SYSTRAN MT systems have successfully served practical translation needs for nearly 30 years, and built a reputation in the MT world for their large, mature dictionaries. This paper describes various aspects of SYSTRAN MT dictionary development as an important part of the development and refinement of SYSTRAN MT systems. There are 4 major sections: 1) Role and Importance of Dictionaries in the SYSTRAN Paradigm describes the importance of coverage and depth in the dictionaries; 2) Dictionary Structure discusses the specifics of dictionary structure and types of information represented; 3) Dictionary Creation and Update describes the strategy and mechanics of the dictionary development; 4) Past. Present and Future Development provides some perspective on where SYSTRAN has come from and where it is going.
MT as a Commercial Service: Three Case Studies
This paper presents three cases studies showing the considerably different uses customers make of our Dutch-English MT service.
Java and Its Role in Natural Language Processing and Machine Translation
The Java programming language started as the language Oak when the World Wide Web was still being developed at CERN. It has gained popularity since its launch as a programming language capable of being used to develop applications which can run across the Internet (as well as local stand-alone programs). As with many technologies associated with the World Wide Web, there is a lot of 'hype', confusion, and misinformation. Consequently, while many researchers in the area of Natural Language Processing and Machine Translation will have heard of Java, may be considering using it, or even have got as far as their first 'Hello World' applet, they are probably not fully aware of what the implications of using this language are, and what possible role it could have in the development of computational linguistic applications, either intended to run locally on a wide range of computing platforms, or remotely across the Internet. This paper sets out to address this issue by presenting Java in a clear, concise fashion and considering how it may be used in computational linguistic applications. A requirements analysis for a generic Natural Language Processing and Machine Translation tool is undertaken to consider how Java could be used, and subsequently two example systems developed in Java (which can be accessed on the Internet) are introduced. Finally, pointers to Java resources are presented so that researchers interested in using this language can both install it and learn how to program it.
End-to-End Evaluation in VERBMOBIL I
VERBMOBIL is a speech-to-speech translation system for spoken dialogues between two speakers. The application scenario is appointment scheduling for business meetings, with spoken dialogues between two speakers. Both dialogue participants have at least a passive knowledge of English which serves as intermediate language1. The transfer directions are German to English and Japanese to English. A special feature of VERBMOBIL is that translations are produced on demand when the dialogue participants are unable to express themselves in English and therefore prefer to use their mother tongue. In this paper2 we present the criteria and the evaluation procedure for evaluating the translation quality of the VERBMOBIL prototype. The evaluated data have been produced by three concurrent processing methods that are integrated in the VERBMOBIL prototype. These processing methods differ with respect to processing depth, processing speed and translation quality (, p. 2). The paper is structured as follows: we start by giving a short description of the VERBMOBIL architecture focusing on the concurrent linguistic analyses and transfer processes which lead to three alternative translation outputs for each turn3. In section two we outline the evaluation procedure and criteria. The third section discusses the evaluation results, and the conclusion of the paper gives an outlook to future applications of automated evaluation procedures for machine translation (MT) based on an MT architecture where several concurrent translation approaches are integrated.
Using MT in a Corporate Setting
Using MT in a Corporate Setting
The model of conceptual structure mapping: a psycholinguistic approach to interlingual representation
Associating semantic components with intersective Levin classes
Hoa Trang Dang
Spanish EuroWordNet and LCS-based interlingual MT
Bonnie J. Dorr
M. Antonia Martí
We present a machine translation framework in which the interlingua— Lexical Conceptual Structure (LCS)—is coupled with a definitional component that includes bilingual (EuroWordNet) links between words in the source and target languages. While the links between individual words are language-specific, the LCS is designed to be a language-independent, compositional representation. We take the view that the two types of information—shallower, transfer-like knowledge as well as deeper, compositional knowledge—can be reconciled in interlingual machine translation, the former for overcoming the intractability of LCS-based lexical selec- tion, and the latter for relating the underlying semantics of two words cross-linguistically. We describe the acquisition process for these two information types and present results of hand-verification of the acquired lexicon. Finally, we demonstrate the utility of the two information types in interlingual MT.
Toward compact monotonically compositional interlingua using lexical aspect
Bonnie J. Dorr
Mari Broman Olsen
Scott C. Thomas
We describe a theoretical investigation into the semantic space described by our interlingua (IL), which currently has 191 main verb classes divided into 434 subclasses, represented by 237 distinct Lexical Conceptual Structures (LCSs). Using the model of aspect in Olsen (1994; 1997)—monotonic aspectual composition—we have identified 71 aspectually basic subclasses that are associated with one or more of 68 aspectually non-basic classes via some lexical (“type-shifting”) rule (Bresnan, 1982; Pinker, 1984; Levin and Rappaport Hovav, 1995). This allows us to refine the IL and address certain computational and theoretical issues at the same time. (1) From a linguistic viewpoint, the expected benefits include a refinement of the aspectual model in (Olsen, 1994; Olsen, 1997) (which provides necessary but not sufficient conditions for aspectual com- position), and a refinement of the verb classifications in (Levin, 1993); we also expect our approach to eventually produce a systematic definition (in terms of LCSs and compositional operations) of the precise meaning components responsible for Levin's classification. (2) Computationally, the lexicon is made more compact.
On representing language-specific information in interlingua
Time to eat peaches: language-specific information in interlingua
Improving the precision of lexicon-to-ontology alignment algorithms
Latifur R. Khan
Eduard H. Hovy
Interlingua developed and utilized in real multilingual MT product systems
This paper describes characteristics of an interlingua we have developed. It contains a large lexicon and has been tested on actual MT systems in the translation of large volumes of actual documents. The main characteristics of the interlingua are as follows: (1) Conceptual primitives, elements of the interlingua, can be linked to any parts of speech in English or Japanese. (2) Positions of the top node on the interlingua correspond to differences in syntactic structures. (3) Two or more conceptual graphs can be used for expressing the same concept, and can be converted to another by conceptual transformation rules which are independent of any specific language. (4) Conceptual primitives are divided into two classes; (a) functional conceptual primitives, which are finite and manageable and constitute, along with rules for interpreting conceptual graphs, the grammar of the interlingua, and (b) general conceptual primitives, which correspond to specific words in actual languages and which, depending on the direction of translation, may or may not be used. Our commercial MT products using the interlingua produce results of roughly the same or higher quality than systems using the syntactic transfer method, which fact indicates the feasibility of the interlingua approach.
Simplification of nomenclature leads to an ideal IL for human language communication
The use of pegs computational discourse framework as an interlingua representation
EDR’s concept classification and description for interlingual representation
This paper describes the outline of the EDR Concept Dictionary and gives some examples of interlingual representations as the semantic representations for an input sentence.
Enriching lexical transfer with cross-linguistic semantic features or how to do interlingua without interlingua
Using WordNet to posit hierarchical structure in Levin’s verb classes
Mari Broman Olsen
Bonnie J. Dorr
David J. Clark
In this paper we report on experiments using WordNet synset tags to evaluate the semantic properties of the verb classes cataloged by Levin (1993). This paper represents ongoing research begun at the University of Pennsylvania (Rosenzweig and Dang, 1997; Palmer, Rosenzweig, and Dang, 1997) and the University of Maryland (Dorr and Jones, 1996b; Dorr and Jones, 1996a; Dorr and Jones, 1996c). Using WordNet sense tags to constrain the intersection of Levin classes, we avoid spurious class intersections introduced by homonymy and polysemy (run a bath, run a mile). By adding class intersections based on a single shared sense-tagged word, we minimize the impact of the non-exhaustiveness of Levin’s database (Dorr and Olsen, 1996; Dorr, To appear). By examining the syntactic properties of the intersective classes, we provide a clearer picture of the relationship between WordNet/EuroWordNet and the LCS interlingua for machine translation and other NLP applications.