Sentence Alignment Methods for Improving Text Simplification Systems
Sanja Štajner, Marc Franco-Salvador, Simone Paolo Ponzetto, Paolo Rosso, Heiner Stuckenschmidt
Abstract
We provide several methods for sentence-alignment of texts with different complexity levels. Using the best of them, we sentence-align the Newsela corpora, thus providing large training materials for automatic text simplification (ATS) systems. We show that using this dataset, even the standard phrase-based statistical machine translation models for ATS can outperform the state-of-the-art ATS systems.- Anthology ID:
- P17-2016
- Volume:
- Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
- Month:
- July
- Year:
- 2017
- Address:
- Vancouver, Canada
- Editors:
- Regina Barzilay, Min-Yen Kan
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 97–102
- Language:
- URL:
- https://aclanthology.org/P17-2016
- DOI:
- 10.18653/v1/P17-2016
- Cite (ACL):
- Sanja Štajner, Marc Franco-Salvador, Simone Paolo Ponzetto, Paolo Rosso, and Heiner Stuckenschmidt. 2017. Sentence Alignment Methods for Improving Text Simplification Systems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 97–102, Vancouver, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Sentence Alignment Methods for Improving Text Simplification Systems (Štajner et al., ACL 2017)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/P17-2016.pdf
- Data
- Newsela