In the field of citizen linguistics, various initiatives are aimed at the creation of language resources by members of the public. To recruit and retain these participants different incentives informed by different motivations, extrinsic and intrinsic ones, play a role at different project stages. Illustrated by a project in the field of lexicography which draws on the extrinsic and/or intrinsic motivation of participants, the complexity of providing the ‘right’ incentives is addressed. This complexity does not only surface when considering cultural differences and the heterogeneity of the motivations participants might have but also through the changing motivations over time. Here, identifying target groups may help to guide recruitment, retention and dissemination activities. In addition, continuous adaptations may be required during the course of the project to strike a balance between necessary and feasible incentives.
We describe our submission to the CogALex-VI shared task on the identification of multilingual paradigmatic relations building on XLM-RoBERTa (XLM-R), a robustly optimized and multilingual BERT model. In spite of several experiments with data augmentation, data addition and ensemble methods with a Siamese Triple Net, Translrelation, the XLM-R model with a linear classifier adapted to this specific task, performed best in testing and achieved the best results in the final evaluation of the shared task, even for a previously unseen language.
Language resources are a major ingredient for the advancement of language technologies. Citizen linguistics can help to create language resources and annotate language resources, not only for the improvement of language technologies, such as machine translation but also for the advancement of linguistic research. The (language) resources covered in this article are a corpus related to the Question of the Month project strand, which was initially aimed at co-creation in citizen linguistics and a partially annotated database of pictures of written text in different languages found in the public sphere. The number of participants in these project strands differed significantly. Especially those activities that were related to data collection (and analysis) had a significantly higher number of contributions per participant. This especially held true for the activities with (prize) incentives. Nevertheless, the activities of the Question of the Month could reach a higher number of participants, even after the co-creation approach was no longer followed. In addition, the Question of the Month brought research gaps and new knowledge to light and challenged existing paradigms and practices. These are especially important for the advancement of scholarly research. Citizen linguistics can help gather and analyze linguistic data, including language resources, in a short period of time. Thus, it may help increase the access to and availability of language resources.
The Austrian Language Resource Portal (Sprachressourcenportal Österreichs) is Austria’s central platform for language resources in the area of public administration. It focuses on language resources in the Austrian variety of the German language. As a product of the cooperation between a public administration body and a university, the Portal contains various language resources (terminological resources in the public administration domain, a language guide, named entities based on open public data, translation memories, etc.). German is a pluricentric language that considerably varies in the domain of public administration due to different public administration systems. Therefore, the Austrian Language Resource Portal stresses the importance of language resources specific to a language variety, thus paving the way for the re-use of variety-specific language data for human language technology, such as machine translation training, for the Austrian standard variety.