2014
pdf
abs
Evaluating the effects of interactivity in a post-editing workbench
Nancy Underwood
|
Bartolomé Mesa-Lao
|
Mercedes García Martínez
|
Michael Carl
|
Vicent Alabau
|
Jesús González-Rubio
|
Luis A. Leiva
|
Germán Sanchis-Trilles
|
Daniel Ortíz-Martínez
|
Francisco Casacuberta
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper describes the field trial and subsequent evaluation of a post-editing workbench which is currently under development in the EU-funded CasMaCat project. Based on user evaluations of the initial prototype of the workbench, this second prototype of the workbench includes a number of interactive features designed to improve productivity and user satisfaction. Using CasMaCat’s own facilities for logging keystrokes and eye tracking, data were collected from nine post-editors in a professional setting. These data were then used to investigate the effects of the interactive features on productivity, quality, user satisfaction and cognitive load as reflected in the post-editors gaze activity. These quantitative results are combined with the qualitative results derived from user questionnaires and interviews conducted with all the participants.
2006
pdf
abs
A Model for Context-Based Evaluation of Language Processing Systems and its Application to Machine Translation Evaluation
Andrei Popescu-Belis
|
Paula Estrella
|
Margaret King
|
Nancy Underwood
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
In this paper, we propose a formal framework that takes into account the influence of the intended context of use of an NLP system on the procedure and the metrics used to evaluate the system. We introduce in particular the notion of a context-dependent quality model and explain how it can be adapted to a given context of use. More specifically, we define vector-space representations of contexts of use and of quality models, which are connected by a generic contextual quality model (GCQM). For each domain, experts in evaluation are needed to build a GCQM based on analytic knowledge and on previous evaluations, using the mechanism proposed here. The main inspiration source for this work is the FEMTI framework for the evaluation of machine translation, which implements partly the present model, and which is described briefly along with insights from other domains.
pdf
abs
ROTE: A Tool to Support Users in Defining the Relative Importance of Quality Characteristics
Agnes Lisowska
|
Nancy L. Underwood
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper describes the Relative Ordering Tool for Evaluation (ROTE) which is designed to support the process of building a parameterised quality model for evaluation. It is a very simple tool which enables users to specify the relative importance of quality characteristics (and associated metrics) to reflect the users' particular requirements. The tool allows users to order any number of quality characteristics by comparing them in a pair-wise fashion. The tool was developed in the context of a collaborative project developing a text mining system. A full scale evaluation of the text mining system was designed and executed for three different users and the ROTE tool was successfully applied by those users during that process. The tool will be made available for general use by the evaluation community.
pdf
abs
Evaluating Symbiotic Systems: the challenge
Margaret King
|
Nancy Underwood
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper looks at a class of systems which pose severe problems in evaluation design for current conventional approaches to evaluation. After describing the two conventional evaluation paradigms: the functionality paradigm as typified by evaluation campaigns and the ISO inspired user-centred paradigm typified by the work of the EAGLES and ISLE projects, it goes on to outline the problems posed by the evaluation of systems which are designed to work in critical interaction with a human expert user and to work over vast amounts of data. These systems pose problems for both paradigms although for different reasons. The primary aim of this paper is to provoke discussion and the search for solutions. We have no proven solutions at present. However, we describe a programme of exploratory research on which we have already embarked, which involves ground clearing work which we expect to result in a deep understanding of the systems and users, a pre-requisite for developing a general framework for evaluation in this field.
pdf
abs
The Evolution of an Evaluation Framework for a Text Mining System
Nancy L. Underwood
|
Agnes Lisowska
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The Parmenides project developed a text mining application applied in three different domains exemplified by case studies for the three user partners in the project. During the lifetime of the project (and in parallel with the development of the system itself) an evaluation framework was developed by the authors in conjunction with the users, and was eventually applied to the system. The object of the exercise was two-fold: firstly to develop and perform a complete user-centered evaluation of the system to assess how well it answered the users' requirements and, secondly, to develop a general framework which could be applied in the context of other users' requirements and (with some modification) to similar systems. In this paper we describe not only the framework but the process of building and parameterising the quality model for each case study and, perhaps most interestingly, the way in which the quality model and users' requirements and expectations evolved over time.
2005
pdf
bib
Finding the System that Suits You Best: Towards the Normalization of MT Evaluation
Paula Estrella
|
Andrei Popescu-Belis
|
Nancy Underwood
Translating and the Computer 27
2001
pdf
abs
Translatability checker: a tool to help decide whether to use MT
Nancy Underwood
|
Bart Jongejan
Proceedings of Machine Translation Summit VIII
This paper describes a tool designed to assess the machine translatability of English source texts by assigning a translatability index to both individual sentences and the text as a whole. The tool is designed to be both stand-alone and integratable into a suite of other tools which together help to improve the quality of professional translation in the preparatory phase of the translation workflow. Assessing translatability is an important element in ensuring the most efficient and cost effective use of current translation technology, and the tool must be able to quickly determine the translatability of a text without itself using too many resources. It is therefore based on rather simple tagging and pattern matching technologies which bring with them a certain level of indeterminacy. This potential disadvantage can, however, be offset by the fact that an annotated version of the text is simultaneously produced to allow the user to interpret the results of the checker.
pdf
bib
abs
Evaluating machine translation output for an unknown source language: report of an ISLE-based investigation
Keith J. Miller
|
Donna M. Gates
|
Nancy Underwood
|
Josemina Magdalen
Workshop on MT Evaluation
It is often assumed that knowledge of both the source and target languages is necessary in order to evaluate the output of a machine translation (MT) system. This paper reports on an experimental evaluation of Chinese-English MT and Spanish-English MT from output specifically designed for evaluators who do not read or speak Chinese or Spanish. An outline of the characteristics measured and evaluation follows.
1999
pdf
Profiling translation projects: an essential part of routing translations
Nancy L. Underwood
|
Bart Jongejan
Proceedings of the 8th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages