This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
JörgSchütz
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Post-editing (PE) is a necessary process in every MT deployment environment. The competences needed for PE are traditionally seen as a subset of a human translator's competence. Meanwhile, some companies are accepting that the PE process involves self-standing linguistic tasks, which need their own training efforts and appropriate software tool support. To date, we still lack recorded qualitatively and quantitatively PE user-activity data that adequately describe the tasks and in particular the human cognitive processes accomplished. This data is needed to effectively model, design and implement supportive software systems which, on the one hand, efficiently guide the human post-editor and enhance her cognitive capabilities, and on the other hand, have a certain influence on the translation performance and competence of the employed MT system. In this paper we argue for a framework of practices to describe the PE process by correlating data obtained in laboratory experiments and augmented by additional data from different resources such as interviews and mathematical prediction models with the tasks fulfilled, and to model the identified process in a multi-facetted fashion as a basis for the implementation of a human PE-aware interactive software system.
Corpus-based MT systems that analyse and generalise texts beyond the surface forms of words require generation tools to re-generate the various internal representations into valid target language (TL) sentences. While the generation of word-forms from lemmas is probably the last step in every text generation process at its very bottom end, token-generation cannot be accomplished without structural and morpho-syntactic knowledge of the sentence to be generated. As in many other MT models, this knowledge is composed of a target language model and a bag of information transferred from the source language. In this paper we establish an abstracted, linguistically informed, target language model. We use a tagger, a lemmatiser and a parser to infer a template grammar from the TL corpus. Given a linguistically informed TL model, the aim is to see what need be provided from the transfer module for generation. During computation of the template grammar, we simultaneously build up for each TL sentence the content of the bag such that the sentence can be deterministically reproduced. In this way we control the completeness of the approach and will have an idea of what pieces of information we need to code in the TL bag.
In this paper, organized in essay style, I first assess the situation of Machine Translation, which is characterized, on the one hand, by unsatisfied user expectations, and, on the other hand, by an ever increasing need for translation technology to fulfil the promises of the global knowledge society, which is promoted by almost all governments and industries worldwide. The assessment is followed by an outline of the design of a blueprint that describes possible steps of an MT evolution regarding short term, mid term and long term developments. Although some user communities might aim at an MT revolution, the evolutionary implementation of the different aspects of the blueprint fit seamless with the foundation that we are faced with in the assessment part. With the blueprint the thesis of this MT evolution essay is established, and the stage is opened for the antithesis in which I develop the points for an MT revolution. Finally, in the synthesis part I develop a combined view which then completes the discussion and the establishment of a blueprint for MT evolution.
This paper provides a nutshell description of how the recently published proposal of a translation quality metric for automotive service information is applicable in an evaluation scenario that deploys multilingual human language technology (mHLT). This proposal is the result of the J2450 task force group of the Society of Automotive Engineers (SAE). The main focus of the developed metric is on the syntactic level of a translation product. Since it is our belief that any evaluation of a translation (human and machine) should also take into account the semantic level of a human language product, we have slightly reshaped the SAE J2450 metric. In addition, we have embedded the whole evaluation process into an object-oriented quality model approach to account for the established business processes in the acquisition, production, translation and dissemination of automotive service information in SGML/XML environments. This scenario then provides the solid grounding for the setup of a quality assurance process for all dimensions related to the processing (human and machine) of automotive service information. The work reported here is one part of the ongoing European Multidoc project that has brought together several European automotive companies to taming the complexity of service information products in an integrated way. Within Multidoc integration means first and foremost the coupling of advanced information technology and mHLT. These aspects will be further motivated and detailed in the context of the specification of an evaluation scenario.
In this paper we report on ongoing verification and validation work within the MULTIDOC project. This project is situated in the field of multilingual automotive product documentation. One central task is the evaluation of existing off-the-shelf and research based language technology (LT) products and components for the purpose of supporting or even reorganising the documentation production chain along three diagnostic dimensions: the process proper, the documentation quality and the translatability of the process output. In this application scenario, LT components shall control and ensure that predefined quality criteria are applicable and measurable to the documentation end-product as well as to the information objects that form the basic building blocks of the end-product. In this scenario, multilinguality is of crucial importance. It shall be introduced or prepared, and maintained as early as possible in the documentation workflow to ensure a better and faster translation process. A prerequisite for the evaluation process is the thorough definition of these dimensions in terms of user quality requirements and LT developer quality requirements. In our approach, we define the output quality of the whole documentation process as the pivot where user requirements and developer requirements shall meet. For this, it turned out that a so-called “braided” diagnostic evaluation is very well suited to cover both views. Since no generally approved standards or even valid specifications for standards exist for the evaluation of LT products, we have adjusted existing standards for the evaluation of software products, in particular ISO 9001, ISO 9000-3, ISO/IEC 12119, ISO 9004 and ISO 9126. This is feasible because an LT product consists of a software part and a lingware part. The adaptation had to be accomplished for the latter part.
This paper outlines a new architecture for a NLP/MT development environment for the EUROTRA project, which will be fully operational in the 1993-94 time frame. The proposed architecture provides a powerful and flexible platform for extensions and enhancements to the existing EUROTRA translation philosophy and the linguistic work done so far, thus allow- ing the reusability of existing grammatical and lexical resources, while ensuring the suitability of EUROTRA methods and tools for other NLP/MT system developers and researchers.