Abstract
Many domains and tasks in natural language generation (NLG) are inherently ‘low-resource’, where training data, tools and linguistic analyses are scarce. This poses a particular challenge to researchers and system developers in the era of machine-learning-driven NLG. In this position paper, we initially present the challenges researchers & developers often encounter when dealing with low-resource settings in NLG. We then argue that it is unsustainable to collect large aligned datasets or build large language models from scratch for every possible domain due to cost, labour, and time constraints, so researching and developing methods and resources for low-resource settings is vital. We then discuss current approaches to low-resource NLG, followed by proposed solutions and promising avenues for future work in NLG for low-resource settings.- Anthology ID:
- 2022.gem-1.29
- Volume:
- Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates (Hybrid)
- Editors:
- Antoine Bosselut, Khyathi Chandu, Kaustubh Dhole, Varun Gangal, Sebastian Gehrmann, Yacine Jernite, Jekaterina Novikova, Laura Perez-Beltrachini
- Venue:
- GEM
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 336–350
- Language:
- URL:
- https://aclanthology.org/2022.gem-1.29
- DOI:
- 10.18653/v1/2022.gem-1.29
- Cite (ACL):
- David M. Howcroft and Dimitra Gkatzia. 2022. Most NLG is Low-Resource: here’s what we can do about it. In Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 336–350, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- Most NLG is Low-Resource: here’s what we can do about it (Howcroft & Gkatzia, GEM 2022)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/2022.gem-1.29.pdf