Georg Moser
2025
LaTeXMT: Machine Translation for LaTeX Documents
Calvin Hoy
|
Samuel Frontull
|
Georg Moser
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
While machine translation has taken great strides in recent years, thanks in large part to transformer language models, machine translation tools are designed primarily for plain text, and thus not equipped to deal with complex markup documents. Not even Large Language Models can reliably handle LaTeX source files, as non-standard structures are not captured by any available training data. Previous attempts to create translation engines for LaTeX either work on compiled documents, rely on document pre-processors which may lose critical semantic elements, or cannot distinguish between text and non-text content. In this paper we present LaTeXMT, a software solution for structure-preserving, source-to-source translation of LaTeX documents. All of the source code to LaTeXMT is provided under the LGPL-3.0 open-source licence and a web version is publicly available.
2024
Rule-Based, Neural and LLM Back-Translation: Comparative Insights from a Variant of Ladin
Samuel Frontull
|
Georg Moser
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)
This paper explores the impact of different back-translation approaches on machine translation for Ladin, specifically the Val Badia variant. Given the limited amount of parallel data available for this language (only 18k Ladin-Italian sentence pairs), we investigate the performance of a multilingual neural machine translation model fine-tuned for Ladin-Italian. In addition to the available authentic data, we synthesise further translations by using three different models: a fine-tuned neural model, a rule-based system developed specifically for this language pair, and a large language model. Our experiments show that all approaches achieve comparable translation quality in this low-resource scenario, yet round-trip translations highlight differences in model performance.