Sentence Compression for Arbitrary Languages via Multilingual Pivoting

Jonathan Mallinson, Rico Sennrich, Mirella Lapata


Abstract
In this paper we advocate the use of bilingual corpora which are abundantly available for training sentence compression models. Our approach borrows much of its machinery from neural machine translation and leverages bilingual pivoting: compressions are obtained by translating a source string into a foreign language and then back-translating it into the source while controlling the translation length. Our model can be trained for any language as long as a bilingual corpus is available and performs arbitrary rewrites without access to compression specific data. We release. Moss, a new parallel Multilingual Compression dataset for English, German, and French which can be used to evaluate compression models across languages and genres.
Anthology ID:
D18-1267
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2453–2464
Language:
URL:
https://aclanthology.org/D18-1267
DOI:
10.18653/v1/D18-1267
Bibkey:
Cite (ACL):
Jonathan Mallinson, Rico Sennrich, and Mirella Lapata. 2018. Sentence Compression for Arbitrary Languages via Multilingual Pivoting. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2453–2464, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Sentence Compression for Arbitrary Languages via Multilingual Pivoting (Mallinson et al., EMNLP 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/D18-1267.pdf
Attachment:
 D18-1267.Attachment.zip
Video:
 https://vimeo.com/305663630
Code
 Jmallins/MOSS
Data
Sentence Compression