Taemin Yeom


2025

pdf bib
Tagged Span Annotation for Detecting Translation Errors in Reasoning LLMs
Taemin Yeom | Yonghyun Ryu | Yoonjung Choi | Jinyeong Bak
Proceedings of the Tenth Conference on Machine Translation

We present the AIP team’s submission to the WMT 2025 Unified MT Evaluation SharedTask, focusing on the span-level error detection subtask. Our system emphasizes response format design to better harness the capabilities of OpenAI’s o3, the state-of-the-art reasoning LLM. To this end, we introduce Tagged SpanAnnotation (TSA), an annotation scheme designed to more accurately extract span-level information from the LLM. On our refined version of WMT24 ESA dataset, our reference-free method achieves an F1 score of approximately 27 for character-level label prediction, outperforming the reference-based XCOMET-XXL at approximately 17.