Paraphrasing as Zero-shot Translation with Feature-guided Diversity Enhancement

Ziyue Yan, Hongying Zan, Xinglin Lyu, Hongfei Xu


Abstract
Paraphrasing uses different words, sentence structures, or expressions to convey similar semantics. It is an effective training data augmentation method to improve low-resource Natural Language Processing (NLP) tasks. Existing studies normally leverage parallel corpora to construct parabanks, regarding the Machine Translation (MT) results of source sentences as the paraphrases of the corresponding target sentences. As MT models are usually trained on the same parallel corpus, translation of the training set may suffer from overfitting, which leads to less diverse paraphrases. Training paraphrasers on the parabank generated via MT may also suffer from the information loss issue, as the parabank is derived from the parallel corpora, and the knowledge inside the parabank is a subset of that inside the parallel corpora. In this paper, we train bidirectional Multilingual Neural Machine Translation (MNMT) on the bi-directional bilingual parallel corpus, and use the MNMT model directly as a paraphrasing model by asking it to generate "translations" of the input language. As some source tokens also appear in the translation in the parallel corpus, we introduce "copy"/"not-copy" tags to indicate the existence/non-existence of source tokens in the target translation during training, and use the "not-copy" tag to encourage paraphrasing during inference. Manual and automatic evaluation results show that our ParaMNMT method can generate paraphrases of higher semantic consistency, literal fluency and sentential diversity compared to existing parabanks and LLMs. Our data augmentation experiments verify the effectiveness of ParaMNMT on improving low-resource NLP tasks.
Anthology ID:
2026.acl-long.783
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17211–17223
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.783/
DOI:
Bibkey:
Cite (ACL):
Ziyue Yan, Hongying Zan, Xinglin Lyu, and Hongfei Xu. 2026. Paraphrasing as Zero-shot Translation with Feature-guided Diversity Enhancement. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 17211–17223, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Paraphrasing as Zero-shot Translation with Feature-guided Diversity Enhancement (Yan et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.783.pdf
Checklist:
 2026.acl-long.783.checklist.pdf