Proceedings of the Eleventh Workshop on Patent and Scientific Literature Translation (PSLT 2025)

Takashi Tsunakawa, Katsuhito Sudoh, Isao Goto (Editors)


Anthology ID:
2025.pslt-1
Month:
June
Year:
2025
Address:
Geneva, Switzerland
Venue:
pslt
SIG:
Publisher:
European Association for Machine Translation
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.pslt-1/
DOI:
ISBN:
978-2-9701897-2-5
Bib Export formats:
BibTeX
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.pslt-1.pdf

pdf bib
Proceedings of the Eleventh Workshop on Patent and Scientific Literature Translation (PSLT 2025)
Takashi Tsunakawa | Katsuhito Sudoh | Isao Goto

pdf bib
GenAIese - A Comprehensive Comparison of GPT-4o and DeepSeek-V3 for English-to-Chinese Academic Translation
Longhui Zou | Ke Li | Joshua Lamerton | Mehdi Mirzapour

This study investigates the translation performance of two large language models–ChatGPT-4o and DeepSeek-V3–in translating English academic papers on on language, culture, and literature into Chinese at the discourse level. Using a corpus of 11 academic texts totaling 3,498 sentences, we evaluated translation quality through automatic metrics (COMET-KIWI), lexical diversity indicators, and syntactic complexity measures. Our findings reveal an interesting contrast\colon while DeepSeek-V3 achieves higher overall quality scores, GPT-4o produces translations with consistently greater lexical richness (higher type-token ratio, standardized TTR, average sentence length, and word entropy) and syntactic complexity across all five measured metrics, such as Incomplete Dependency Theory Metric (IDT), Dependency Locality Theory Metric (DLT), Combined IDT+DLT Metric (IDT+DLT), Left-Embeddedness (LE), and Nested Nouns Distance (NND). Particularly notable are GPT-4o’s higher scores in Left-Embeddedness and Nested Nouns Distance metrics, which are specifically relevant to Chinese linguistic patterns. The divergence between automatic quality estimation and linguistic complexity metrics highlights the multifaceted nature of translation quality assessment.

pdf bib
Tailoring Machine Translation for Scientific Literature through Topic Filtering and Fuzzy Match Augmentation
Thomas Moerman | Tom Vanallemeersch | Sara Szoc | Arda Tezcan

To enhance the accessibility of scientific literature in multiple languages and facilitate the exchange of information among scholars and a wider audience, there is a need for high-performing specialized machine translation (MT) engines. However, this requires efficient filtering and the use of domain-specific data. In this study, we investigate whether approaches for increasing training data using topic filtering and more efficient use of such data through exploiting fuzzy matches (i.e. similar translations to a given input; FMs) improve translation quality. We apply these techniques both to sequence-to-sequence MT models and off-the-shelf multilingual large language models (LLMs) in three scientific disciplines. Our results suggest that the combination of topic filtering and FM augmentation is an effective strategy for training neural machine translation (NMT) models from scratch, not only surpassing baseline NMT models but also delivering improved translation performance compared to smaller LLMs in terms of the number of parameters. Furthermore, we find that although FM augmentation through in-context learning generally improves LLM translation performance, limited domain-specific datasets can yield results comparable to those achieved with additional multi-domain datasets.