Sentence Alignment Methods for Improving Text Simplification Systems

Sanja Štajner, Marc Franco-Salvador, Simone Paolo Ponzetto, Paolo Rosso, Heiner Stuckenschmidt


Abstract
We provide several methods for sentence-alignment of texts with different complexity levels. Using the best of them, we sentence-align the Newsela corpora, thus providing large training materials for automatic text simplification (ATS) systems. We show that using this dataset, even the standard phrase-based statistical machine translation models for ATS can outperform the state-of-the-art ATS systems.
Anthology ID:
P17-2016
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
97–102
Language:
URL:
https://aclanthology.org/P17-2016
DOI:
10.18653/v1/P17-2016
Bibkey:
Cite (ACL):
Sanja Štajner, Marc Franco-Salvador, Simone Paolo Ponzetto, Paolo Rosso, and Heiner Stuckenschmidt. 2017. Sentence Alignment Methods for Improving Text Simplification Systems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 97–102, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Sentence Alignment Methods for Improving Text Simplification Systems (Štajner et al., ACL 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/P17-2016.pdf
Video:
 https://vimeo.com/234958364
Data
Newsela