Context-Driven and Reference-Guided Data Augmentation for Subtitle Translation
Hitoshi Ito, Naoto Shirai, Kazutaka Kinugawa, Hideya Mino, Rei Endo, Yoshihiko Kawai
Abstract
Large language models (LLMs) have demonstrated strong performance in translation tasks. Subtitle translation presents unique challenges, such as preserving the original work’s worldview and the distinctive speaking styles of its characters. Achieving high-quality translations that reflect these stylistic nuances typically requires bilingual data for a specific movie, which is often scarce or unavailable. Thus, we propose a data augmentation method that uses LLMs to improve translation performance for specific movies, even when only a few hundred bilingual sentence pairs are available. The method expands source-side data by rewriting original subtitles using information that can be extracted from the context, such as character profiles and scene descriptions, to maintain the tone and thematic consistency of the movie. For translation, the augmented sentences are aligned with manually translated originals using structural similarity, which enables style-preserving bilingual data generation via one-shot learning. Experimental results show that data augmented using the proposed method effectively improves BLEU scores for film subtitle translation, and achieves superior stylistic quality in human evaluation.- Anthology ID:
- 2026.findings-acl.2059
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 41381–41394
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2059/
- DOI:
- Cite (ACL):
- Hitoshi Ito, Naoto Shirai, Kazutaka Kinugawa, Hideya Mino, Rei Endo, and Yoshihiko Kawai. 2026. Context-Driven and Reference-Guided Data Augmentation for Subtitle Translation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 41381–41394, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Context-Driven and Reference-Guided Data Augmentation for Subtitle Translation (Ito et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2059.pdf