SubDocTrans: Enhancing Document-level Machine Translation with Plug-and-play Multi-granularity Knowledge Augmentation

Hanghai Hong, Yibo Xie, Jiawei Zheng, Xiaoli Wang


Abstract
Large language models (LLMs) have recently achieved remarkable progress in sentence-level machine translation, but scaling to document-level machine translation (DocMT) remains challenging, particularly in modeling long-range dependencies and discourse phenomena across sentences and paragraphs. Document translations generated by LLMs often suffer from poor consistency, weak coherence, and omission errors. To address these issues, we propose SubDocTrans, a novel DocMT framework that enables LLMs to produce high-quality translations through plug-and-play, multi-granularity knowledge extraction and integration. SubDocTrans first performs topic segmentation to divide a document into coherent topic sub-documents. For each sub-document, both global and local knowledge are extracted including bilingual summary, theme, proper nouns, topics, and transition hint. We then incorporate this multi-granularity knowledge into the prompting strategy, to guide LLMs in producing consistent, coherent, and accurate translations. We conduct extensive experiments across various DocMT tasks, and the results demonstrate the effectiveness of our framework, particularly in improving consistency and coherence, reducing omission errors, and mitigating hallucinations.
Anthology ID:
2025.findings-emnlp.782
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14490–14506
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.782/
DOI:
10.18653/v1/2025.findings-emnlp.782
Bibkey:
Cite (ACL):
Hanghai Hong, Yibo Xie, Jiawei Zheng, and Xiaoli Wang. 2025. SubDocTrans: Enhancing Document-level Machine Translation with Plug-and-play Multi-granularity Knowledge Augmentation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 14490–14506, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
SubDocTrans: Enhancing Document-level Machine Translation with Plug-and-play Multi-granularity Knowledge Augmentation (Hong et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.782.pdf
Checklist:
 2025.findings-emnlp.782.checklist.pdf