@inproceedings{stymne-ahrenberg-2012-practice,
    title = "On the practice of error analysis for machine translation evaluation",
    author = "Stymne, Sara  and
      Ahrenberg, Lars",
    editor = "Calzolari, Nicoletta  and
      Choukri, Khalid  and
      Declerck, Thierry  and
      Do{\u{g}}an, Mehmet U{\u{g}}ur  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Moreno, Asuncion  and
      Odijk, Jan  and
      Piperidis, Stelios",
    booktitle = "Proceedings of the Eighth International Conference on Language Resources and Evaluation ({LREC}'12)",
    month = may,
    year = "2012",
    address = "Istanbul, Turkey",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/L12-1417/",
    pages = "1785--1790",
    abstract = "Error analysis is a means to assess machine translation output in qualitative terms, which can be used as a basis for the generation of error profiles for different systems. As for other subjective approaches to evaluation it runs the risk of low inter-annotator agreement, but very often in papers applying error analysis to MT, this aspect is not even discussed. In this paper, we report results from a comparative evaluation of two systems where agreement initially was low, and discuss the different ways we used to improve it. We compared the effects of using more or less fine-grained taxonomies, and the possibility to restrict analysis to short sentences only. We report results on inter-annotator agreement before and after measures were taken, on error categories that are most likely to be confused, and on the possibility to establish error profiles also in the absence of a high inter-annotator agreement."
}Markdown (Informal)
[On the practice of error analysis for machine translation evaluation](https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/L12-1417/) (Stymne & Ahrenberg, LREC 2012)
ACL