Xiangyao Ma
2026
BabelDOC: Better Layout-Preserving PDF Translation via Intermediate Representation
Yang Qi | Xiangyao Ma | Xiao Wang | Hao Wang | Rui Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Yang Qi | Xiangyao Ma | Xiao Wang | Hao Wang | Rui Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
As global cross-lingual communication intensifies, language barriers in visually rich documents such as PDFs remain a practical bottleneck. Existing document translation pipelines face a tension between linguistic processing and layout preservation: text-oriented Computer-Assisted Translation (CAT) systems often discard structural metadata, while document parsers focus on extraction and do not support faithful re-rendering after translation. We introduce BabelDOC, an Intermediate Representation (IR)-based framework for layout-preserving PDF translation. BabelDOC decouples visual layout metadata from semantic content, enabling document-level translation operations such as terminology extraction, cross-page context handling, glossary-constrained generation, and formula placeholdering. The translated content is then re-anchored to the original layout through an adaptive typesetting engine. Experiments on a curated 200-page benchmark, together with human evaluation and multimodal LLM-as-a-judge evaluation, show that BabelDOC improves layout fidelity, visual aesthetics, and terminology consistency over representative baselines, while maintaining competitive translation precision. The open-source toolkit and its interactive downstream applications have garnered over 7.8k stars on GitHub https://github.com/funstory-ai/BabelDOC. A demonstration video is available at https://youtu.be/chwrlApH7a4.
2025
PDFMathTranslate: Scientific Document Translation Preserving Layouts
Rongxin Ouyang | Chang Chu | Zhikuang Xin | Xiangyao Ma
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Rongxin Ouyang | Chang Chu | Zhikuang Xin | Xiangyao Ma
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Language barriers in scientific documents hinder the diffusion and development of science and technologies. However, prior efforts in translating such documents largely overlooked the information in layouts. To bridge the gap, we introduce PDFMathTranslate, the world’s first open-source software for translating scientific documents while preserving layouts. Leveraging the most recent advances in large language models and precise layout detection, we contribute to the community with key improvements in precision, flexibility, and efficiency. The work is open-sourced at https://github.com/byaidu/pdfmathtranslate with more than 222k downloads.