Detecting Machine-Translated Text using Back Translation
Hoang-Quoc Nguyen-Son, Thao Tran Phuong, Seira Hidano, Shinsaku Kiyomoto
Abstract
Machine-translated text plays a crucial role in the communication of people using different languages. However, adversaries can use such text for malicious purposes such as plagiarism and fake review. The existing methods detected a machine-translated text only using the text’s intrinsic content, but they are unsuitable for classifying the machine-translated and human-written texts with the same meanings. We have proposed a method to extract features used to distinguish machine/human text based on the similarity between the intrinsic text and its back-translation. The evaluation of detecting translated sentences with French shows that our method achieves 75.0% of both accuracy and F-score. It outperforms the existing methods whose the best accuracy is 62.8% and the F-score is 62.7%. The proposed method even detects more efficiently the back-translated text with 83.4% of accuracy, which is higher than 66.7% of the best previous accuracy. We also achieve similar results not only with F-score but also with similar experiments related to Japanese. Moreover, we prove that our detector can recognize both machine-translated and machine-back-translated texts without the language information which is used to generate these machine texts. It demonstrates the persistence of our method in various applications in both low- and rich-resource languages.- Anthology ID:
- W19-8626
- Volume:
- Proceedings of the 12th International Conference on Natural Language Generation
- Month:
- October–November
- Year:
- 2019
- Address:
- Tokyo, Japan
- Editors:
- Kees van Deemter, Chenghua Lin, Hiroya Takamura
- Venue:
- INLG
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 189–197
- Language:
- URL:
- https://aclanthology.org/W19-8626
- DOI:
- 10.18653/v1/W19-8626
- Cite (ACL):
- Hoang-Quoc Nguyen-Son, Thao Tran Phuong, Seira Hidano, and Shinsaku Kiyomoto. 2019. Detecting Machine-Translated Text using Back Translation. In Proceedings of the 12th International Conference on Natural Language Generation, pages 189–197, Tokyo, Japan. Association for Computational Linguistics.
- Cite (Informal):
- Detecting Machine-Translated Text using Back Translation (Nguyen-Son et al., INLG 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/W19-8626.pdf