Japanese Sentence Compression with a Large Training Dataset

Shun Hasegawa, Yuta Kikuchi, Hiroya Takamura, Manabu Okumura


Abstract
In English, high-quality sentence compression models by deleting words have been trained on automatically created large training datasets. We work on Japanese sentence compression by a similar approach. To create a large Japanese training dataset, a method of creating English training dataset is modified based on the characteristics of the Japanese language. The created dataset is used to train Japanese sentence compression models based on the recurrent neural network.
Anthology ID:
P17-2044
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
281–286
Language:
URL:
https://aclanthology.org/P17-2044
DOI:
10.18653/v1/P17-2044
Bibkey:
Cite (ACL):
Shun Hasegawa, Yuta Kikuchi, Hiroya Takamura, and Manabu Okumura. 2017. Japanese Sentence Compression with a Large Training Dataset. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 281–286, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Japanese Sentence Compression with a Large Training Dataset (Hasegawa et al., ACL 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/P17-2044.pdf
Data
Sentence Compression