About the Applicability of Combining Implicit Crowdsourcing and Language Learning for the Collection of NLP Datasets

Verena Lyding, Lionel Nicolas, Alexander König


Abstract
In this article, we present a recent trend of approaches, hereafter referred to as Collect4NLP, and discuss its applicability. Collect4NLP-based approaches collect inputs from language learners through learning exercises and aggregate the collected data to derive linguistic knowledge of expert quality. The primary purpose of these approaches is to improve NLP resources, however sincere concern with the needs of learners is crucial for making Collect4NLP work. We discuss the applicability of Collect4NLP approaches in relation to two perspectives. On the one hand, we compare Collect4NLP approaches to the two crowdsourcing trends currently most prevalent in NLP, namely Crowdsourcing Platforms (CPs) and Games-With-A-Purpose (GWAPs), and identify strengths and weaknesses of each trend. By doing so we aim to highlight particularities of each trend and to identify in which kind of settings one trend should be favored over the other two. On the other hand, we analyze the applicability of Collect4NLP approaches to the production of different types of NLP resources. We first list the types of NLP resources most used within its community and second propose a set of blueprints for mapping these resources to well-established language learning exercises as found in standard language learning textbooks.
Anthology ID:
2022.nidcp-1.8
Volume:
Proceedings of the 2nd Workshop on Novel Incentives in Data Collection from People: models, implementations, challenges and results within LREC 2022
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
NIDCP
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
46–57
Language:
URL:
https://aclanthology.org/2022.nidcp-1.8
DOI:
Bibkey:
Cite (ACL):
Verena Lyding, Lionel Nicolas, and Alexander König. 2022. About the Applicability of Combining Implicit Crowdsourcing and Language Learning for the Collection of NLP Datasets. In Proceedings of the 2nd Workshop on Novel Incentives in Data Collection from People: models, implementations, challenges and results within LREC 2022, pages 46–57, Marseille, France. European Language Resources Association.
Cite (Informal):
About the Applicability of Combining Implicit Crowdsourcing and Language Learning for the Collection of NLP Datasets (Lyding et al., NIDCP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.nidcp-1.8.pdf