Abstract
In this paper, we describe our approach to generate and rank candidate simplifications using pre-trained language models (Eg. BERT), publicly available word embeddings (Eg. FastText), and a part-of-speech tagger, to generate and rank candidate contextual simplifications for a given complex word. In this task, our system, PresiUniv, was placed first in the Spanish track, 5th in the Brazilian-Portuguese track, and 10th in the English track. We upload our codes and data for this project to aid in replication of our results. We also analyze some of the errors and describe design decisions which we took while writing the paper.- Anthology ID:
- 2022.tsar-1.22
- Volume:
- Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates (Virtual)
- Editors:
- Sanja Štajner, Horacio Saggion, Daniel Ferrés, Matthew Shardlow, Kim Cheng Sheang, Kai North, Marcos Zampieri, Wei Xu
- Venue:
- TSAR
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 213–217
- Language:
- URL:
- https://aclanthology.org/2022.tsar-1.22
- DOI:
- 10.18653/v1/2022.tsar-1.22
- Cite (ACL):
- Peniel Whistely, Sandeep Mathias, and Galiveeti Poornima. 2022. PresiUniv at TSAR-2022 Shared Task: Generation and Ranking of Simplification Substitutes of Complex Words in Multiple Languages. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), pages 213–217, Abu Dhabi, United Arab Emirates (Virtual). Association for Computational Linguistics.
- Cite (Informal):
- PresiUniv at TSAR-2022 Shared Task: Generation and Ranking of Simplification Substitutes of Complex Words in Multiple Languages (Whistely et al., TSAR 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2022.tsar-1.22.pdf