Abstract
In this paper we propose a novel approach to automatically classify the level of formality in Japanese text, using three categories (formal, polite, and informal). We introduce a new dataset that combine manually-annotated sentences from existing resources, and formal sentences scrapped from the website of the House of Representatives and the House of Councilors of Japan. Based on our data, we propose a Transformer-based classification model for Japanese, which obtains state-of-the-art results in benchmark datasets. We further propose to utilize our classifier to study the effectiveness of prompting techniques for controlling the formality level of machine translation (MT) using Large Language Models (LLM). Our experimental setting includes a large selection of such models and is based on an En->Ja parallel corpus specifically designed to test formality control in MT. Our results validate the robustness and effectiveness of our proposed approach and while also providing empirical evidence suggesting that prompting LLMs is a viable approach to control the formality level of En->Ja MT using LLMs.- Anthology ID:
- 2023.wmt-1.49
- Volume:
- Proceedings of the Eighth Conference on Machine Translation
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 551–560
- Language:
- URL:
- https://aclanthology.org/2023.wmt-1.49
- DOI:
- 10.18653/v1/2023.wmt-1.49
- Cite (ACL):
- Edison Marrese-Taylor, Pin Chen Wang, and Yutaka Matsuo. 2023. Towards Better Evaluation for Formality-Controlled English-Japanese Machine Translation. In Proceedings of the Eighth Conference on Machine Translation, pages 551–560, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Towards Better Evaluation for Formality-Controlled English-Japanese Machine Translation (Marrese-Taylor et al., WMT 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2023.wmt-1.49.pdf