Abstract
Definition modelling is the task of automatically generating a dictionary-style definition given a target word. In this paper, we consider cross-lingual definition generation. Specifically, we generate English definitions for Wolastoqey (Malecite-Passamaquoddy) words. Wolastoqey is an endangered, low-resource polysynthetic language. We hypothesize that sub-word representations based on byte pair encoding (Sennrich et al., 2016) can be leveraged to represent morphologically-complex Wolastoqey words and overcome the challenge of not having large corpora available for training. Our experimental results demonstrate that this approach outperforms baseline methods in terms of BLEU score. - Anthology ID:
- 2021.ranlp-1.17
- Volume:
- Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
- Month:
- September
- Year:
- 2021
- Address:
- Held Online
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 138–146
- Language:
- URL:
- https://aclanthology.org/2021.ranlp-1.17
- DOI:
- Cite (ACL):
- Diego Bear and Paul Cook. 2021. Cross-Lingual Wolastoqey-English Definition Modelling. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 138–146, Held Online. INCOMA Ltd..
- Cite (Informal):
- Cross-Lingual Wolastoqey-English Definition Modelling (Bear & Cook, RANLP 2021)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2021.ranlp-1.17.pdf