Combining Evaluation Metrics via Loss Functions

Calandra Tate, Clare Voss


Abstract
When response metrics for evaluating the utility of machine translation (MT) output on a given task do not yield a single ranking of MT engines, how are MT users to decide which engine best supports their task? When the cost of different types of response errors vary, how are MT users to factor that information into their rankings? What impact do different costs have on response-based rankings? Starting with data from an extraction experiment detailed in Voss and Tate (2006), this paper describes three response-rate metrics developed to quantify different aspects of MT users’ performance identifying who/when/where-items in MT output, and then presents a loss function analysis over these rates to derive a single customizable metric, applying a range of values to correct responses and costs to different error types. For the given experimental dataset, loss function analyses provided a clearer characterization of the engines’ relative strength than did comparing the response rates to each other. For one MT engine, varying the costs had no impact: the engine consistently ranked best. By contrast, cost variations did impact the ranking of the other two engines: a rank reversal occurred on who-item extractions when incorrect responses were penalized more than non-responses. Future work with loss analysis, developing operational cost ratios of error rates to correct response rates, will require user studies and expert document-screening personnel to establish baseline values for effective MT engine support on wh-item extraction.
Anthology ID:
2006.amta-papers.27
Volume:
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
August 8-12
Year:
2006
Address:
Cambridge, Massachusetts, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
242–250
Language:
URL:
https://aclanthology.org/2006.amta-papers.27
DOI:
Bibkey:
Cite (ACL):
Calandra Tate and Clare Voss. 2006. Combining Evaluation Metrics via Loss Functions. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 242–250, Cambridge, Massachusetts, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Combining Evaluation Metrics via Loss Functions (Tate & Voss, AMTA 2006)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2006.amta-papers.27.pdf