Rita. Nübel

Also published as: Rita Nübel


Evaluating language technologies
Jörg Schütz | Rita. Nübel
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

In this paper we report on ongoing verification and validation work within the MULTIDOC project. This project is situated in the field of multilingual automotive product documentation. One central task is the evaluation of existing off-the-shelf and research based language technology (LT) products and components for the purpose of supporting or even reorganising the documentation production chain along three diagnostic dimensions: the process proper, the documentation quality and the translatability of the process output. In this application scenario, LT components shall control and ensure that predefined quality criteria are applicable and measurable to the documentation end-product as well as to the information objects that form the basic building blocks of the end-product. In this scenario, multilinguality is of crucial importance. It shall be introduced or prepared, and maintained as early as possible in the documentation workflow to ensure a better and faster translation process. A prerequisite for the evaluation process is the thorough definition of these dimensions in terms of user quality requirements and LT developer quality requirements. In our approach, we define the output quality of the whole documentation process as the pivot where user requirements and developer requirements shall meet. For this, it turned out that a so-called “braided” diagnostic evaluation is very well suited to cover both views. Since no generally approved standards or even valid specifications for standards exist for the evaluation of LT products, we have adjusted existing standards for the evaluation of software products, in particular ISO 9001, ISO 9000-3, ISO/IEC 12119, ISO 9004 and ISO 9126. This is feasible because an LT product consists of a software part and a lingware part. The adaptation had to be accomplished for the latter part.


End-to-End Evaluation in VERBMOBIL I
Rita Nübel
Proceedings of Machine Translation Summit VI: Papers

VERBMOBIL is a speech-to-speech translation system for spoken dialogues between two speakers. The application scenario is appointment scheduling for business meetings, with spoken dialogues between two speakers. Both dialogue participants have at least a passive knowledge of English which serves as intermediate language1. The transfer directions are German to English and Japanese to English. A special feature of VERBMOBIL is that translations are produced on demand when the dialogue participants are unable to express themselves in English and therefore prefer to use their mother tongue. In this paper2 we present the criteria and the evaluation procedure for evaluating the translation quality of the VERBMOBIL prototype. The evaluated data have been produced by three concurrent processing methods that are integrated in the VERBMOBIL prototype. These processing methods differ with respect to processing depth, processing speed and translation quality ([2], p. 2). The paper is structured as follows: we start by giving a short description of the VERBMOBIL architecture focusing on the concurrent linguistic analyses and transfer processes which lead to three alternative translation outputs for each turn3. In section two we outline the evaluation procedure and criteria. The third section discusses the evaluation results, and the conclusion of the paper gives an outlook to future applications of automated evaluation procedures for machine translation (MT) based on an MT architecture where several concurrent translation approaches are integrated.