MultiLS: An End-to-End Lexical Simplification Framework
Kai North, Tharindu Ranasinghe, Matthew Shardlow, Marcos Zampieri
Abstract
Lexical Simplification (LS) automatically replaces difficult to read words for easier alternatives while preserving a sentence’s original meaning. Several datasets exist for LS and each of them specialize in one or two sub-tasks within the LS pipeline. However, as of this moment, no single LS dataset has been developed that covers all LS sub-tasks. We present MultiLS, the first LS framework that allows for the creation of a multi-task LS dataset. We also present MultiLS-PT, the first dataset created using the MultiLS framework. We demonstrate the potential of MultiLS-PT by carrying out all LS sub-tasks of (1) lexical complexity prediction (LCP), (2) substitute generation, and (3) substitute ranking for Portuguese.- Anthology ID:
- 2024.tsar-1.1
- Volume:
- Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Matthew Shardlow, Horacio Saggion, Fernando Alva-Manchego, Marcos Zampieri, Kai North, Sanja Štajner, Regina Stodden
- Venues:
- TSAR | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1–11
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2024.tsar-1.1/
- DOI:
- 10.18653/v1/2024.tsar-1.1
- Cite (ACL):
- Kai North, Tharindu Ranasinghe, Matthew Shardlow, and Marcos Zampieri. 2024. MultiLS: An End-to-End Lexical Simplification Framework. In Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024), pages 1–11, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- MultiLS: An End-to-End Lexical Simplification Framework (North et al., TSAR 2024)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2024.tsar-1.1.pdf