@inproceedings{dinh-etal-2024-quality,
    title = "Quality Estimation with $k$-nearest Neighbors and Automatic Evaluation for Model-specific Quality Estimation",
    author = "Dinh, Tu Anh  and
      Palzer, Tobias  and
      Niehues, Jan",
    editor = "Scarton, Carolina  and
      Prescott, Charlotte  and
      Bayliss, Chris  and
      Oakley, Chris  and
      Wright, Joanna  and
      Wrigley, Stuart  and
      Song, Xingyi  and
      Gow-Smith, Edward  and
      Bawden, Rachel  and
      S{\'a}nchez-Cartagena, V{\'i}ctor M  and
      Cadwell, Patrick  and
      Lapshinova-Koltunski, Ekaterina  and
      Cabarr{\~a}o, Vera  and
      Chatzitheodorou, Konstantinos  and
      Nurminen, Mary  and
      Kanojia, Diptesh  and
      Moniz, Helena",
    booktitle = "Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)",
    month = jun,
    year = "2024",
    address = "Sheffield, UK",
    publisher = "European Association for Machine Translation (EAMT)",
    url = "https://preview.aclanthology.org/ingest-emnlp/2024.eamt-1.14/",
    pages = "133--146",
    abstract = "Providing quality scores along with Machine Translation (MT) output, so-called reference-free Quality Estimation (QE), is crucial to inform users about the reliability of the translation. We propose a model-specific, unsupervised QE approach, termed $k$NN-QE, that extracts information from the MT model{'}s training data using $k$-nearest neighbors. Measuring the performance of model-specific QE is not straightforward, since they provide quality scores on their own MT output, thus cannot be evaluated using benchmark QE test sets containing human quality scores on premade MT output. Therefore, we propose an automatic evaluation method that uses quality scores from reference-based metrics as gold standard instead of human-generated ones. We are the first to conduct detailed analyses and conclude that this automatic method is sufficient, and the reference-based MetricX-23 is best for the task."
}