Abstract
Neural sequence-to-sequence models have been successfully applied to text compression. However, these models were trained on huge automatically induced parallel corpora, which are only available for a few domains and tasks. In this paper, we propose a novel interactive setup to neural text compression that enables transferring a model to new domains and compression tasks with minimal human supervision. This is achieved by employing active learning, which intelligently samples from a large pool of unlabeled data. Using this setup, we can successfully adapt a model trained on small data of 40k samples for a headline generation task to a general text compression dataset at an acceptable compression quality with just 500 sampled instances annotated by a human.- Anthology ID:
- N19-1262
- Volume:
- Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Editors:
- Jill Burstein, Christy Doran, Thamar Solorio
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2543–2554
- Language:
- URL:
- https://aclanthology.org/N19-1262
- DOI:
- 10.18653/v1/N19-1262
- Cite (ACL):
- Avinesh P.V.S and Christian M. Meyer. 2019. Data-efficient Neural Text Compression with Interactive Learning. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2543–2554, Minneapolis, Minnesota. Association for Computational Linguistics.
- Cite (Informal):
- Data-efficient Neural Text Compression with Interactive Learning (P.V.S & Meyer, NAACL 2019)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/N19-1262.pdf
- Code
- UKPLab/NAACL2019-interactiveCompression
- Data
- Sentence Compression