Parallel Corpus Filtering for Japanese Text Simplification

Koki Hatagaki, Tomoyuki Kajiwara, Takashi Ninomiya


Abstract
We propose a method of parallel corpus filtering for Japanese text simplification. The parallel corpus for this task contains some redundant wording. In this study, we first identify the type and size of noisy sentence pairs in the Japanese text simplification corpus. We then propose a method of parallel corpus filtering to remove each type of noisy sentence pair. Experimental results show that filtering the training parallel corpus with the proposed method improves simplification performance.
Anthology ID:
2022.tsar-1.2
Volume:
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Virtual)
Editors:
Sanja Štajner, Horacio Saggion, Daniel Ferrés, Matthew Shardlow, Kim Cheng Sheang, Kai North, Marcos Zampieri, Wei Xu
Venue:
TSAR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12–18
Language:
URL:
https://aclanthology.org/2022.tsar-1.2
DOI:
10.18653/v1/2022.tsar-1.2
Bibkey:
Cite (ACL):
Koki Hatagaki, Tomoyuki Kajiwara, and Takashi Ninomiya. 2022. Parallel Corpus Filtering for Japanese Text Simplification. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), pages 12–18, Abu Dhabi, United Arab Emirates (Virtual). Association for Computational Linguistics.
Cite (Informal):
Parallel Corpus Filtering for Japanese Text Simplification (Hatagaki et al., TSAR 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2022.tsar-1.2.pdf
Video:
 https://preview.aclanthology.org/naacl24-info/2022.tsar-1.2.mp4