SIMPLEX-PB 2.0: A Reliable Dataset for Lexical Simplification in Brazilian Portuguese
Abstract
Most research on Lexical Simplification (LS) addresses non-native speakers of English, since they are numerous and easy to recruit. This makes it difficult to create LS solutions for other languages and target audiences. This paper presents SIMPLEX-PB 2.0, a dataset for LS in Brazilian Portuguese that, unlike its predecessor SIMPLEX-PB, accurately captures the needs of Brazilian underprivileged children. To create SIMPLEX-PB 2.0, we addressed all limitations of the old SIMPLEX-PB through multiple rounds of manual annotation. As a result, SIMPLEX-PB 2.0 features much more reliable and numerous candidate substitutions to complex words, as well as word complexity rankings produced by a group underprivileged children.- Anthology ID:
- 2020.winlp-1.6
- Volume:
- Proceedings of the Fourth Widening Natural Language Processing Workshop
- Month:
- July
- Year:
- 2020
- Address:
- Seattle, USA
- Editors:
- Rossana Cunha, Samira Shaikh, Erika Varis, Ryan Georgi, Alicia Tsai, Antonios Anastasopoulos, Khyathi Raghavi Chandu
- Venue:
- WiNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 18–22
- Language:
- URL:
- https://aclanthology.org/2020.winlp-1.6
- DOI:
- 10.18653/v1/2020.winlp-1.6
- Cite (ACL):
- Nathan Hartmann, Gustavo Henrique Paetzold, and Sandra Aluísio. 2020. SIMPLEX-PB 2.0: A Reliable Dataset for Lexical Simplification in Brazilian Portuguese. In Proceedings of the Fourth Widening Natural Language Processing Workshop, pages 18–22, Seattle, USA. Association for Computational Linguistics.
- Cite (Informal):
- SIMPLEX-PB 2.0: A Reliable Dataset for Lexical Simplification in Brazilian Portuguese (Hartmann et al., WiNLP 2020)