Japanese Sentence Compression with a Large Training Dataset
Shun Hasegawa, Yuta Kikuchi, Hiroya Takamura, Manabu Okumura
Abstract
In English, high-quality sentence compression models by deleting words have been trained on automatically created large training datasets. We work on Japanese sentence compression by a similar approach. To create a large Japanese training dataset, a method of creating English training dataset is modified based on the characteristics of the Japanese language. The created dataset is used to train Japanese sentence compression models based on the recurrent neural network.- Anthology ID:
- P17-2044
- Volume:
- Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
- Month:
- July
- Year:
- 2017
- Address:
- Vancouver, Canada
- Editors:
- Regina Barzilay, Min-Yen Kan
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 281–286
- Language:
- URL:
- https://aclanthology.org/P17-2044
- DOI:
- 10.18653/v1/P17-2044
- Cite (ACL):
- Shun Hasegawa, Yuta Kikuchi, Hiroya Takamura, and Manabu Okumura. 2017. Japanese Sentence Compression with a Large Training Dataset. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 281–286, Vancouver, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Japanese Sentence Compression with a Large Training Dataset (Hasegawa et al., ACL 2017)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/P17-2044.pdf
- Data
- Sentence Compression