Andrew Schonebaum

2025

pdf bib abs
Evaluating Evaluation Metrics for Ancient Chinese to English Machine Translation
Eric R. Bennett | HyoJung Han | Xinchen Yang | Andrew Schonebaum | Marine Carpuat
Proceedings of the Second Workshop on Ancient Language Processing

Evaluation metrics are an important driver of progress in Machine Translation (MT), but they have been primarily validated on high-resource modern languages. In this paper, we conduct an empirical evaluation of metrics commonly used to evaluate MT from Ancient Chinese into English. Using LLMs, we construct a contrastive test set, pairing high-quality MT and purposefully flawed MT of the same Pre-Qin texts. We then evaluate the ability of each metric to discriminate between accurate and flawed translations.

Co-authors

Eric R. Bennett 1
Marine Carpuat 1
HyoJung Han 1
Xinchen Yang 1

Venues

alp1
ws1

Fix data