2022
pdf
abs
About the Applicability of Combining Implicit Crowdsourcing and Language Learning for the Collection of NLP Datasets
Verena Lyding
|
Lionel Nicolas
|
Alexander König
Proceedings of the 2nd Workshop on Novel Incentives in Data Collection from People: models, implementations, challenges and results within LREC 2022
In this article, we present a recent trend of approaches, hereafter referred to as Collect4NLP, and discuss its applicability. Collect4NLP-based approaches collect inputs from language learners through learning exercises and aggregate the collected data to derive linguistic knowledge of expert quality. The primary purpose of these approaches is to improve NLP resources, however sincere concern with the needs of learners is crucial for making Collect4NLP work. We discuss the applicability of Collect4NLP approaches in relation to two perspectives. On the one hand, we compare Collect4NLP approaches to the two crowdsourcing trends currently most prevalent in NLP, namely Crowdsourcing Platforms (CPs) and Games-With-A-Purpose (GWAPs), and identify strengths and weaknesses of each trend. By doing so we aim to highlight particularities of each trend and to identify in which kind of settings one trend should be favored over the other two. On the other hand, we analyze the applicability of Collect4NLP approaches to the production of different types of NLP resources. We first list the types of NLP resources most used within its community and second propose a set of blueprints for mapping these resources to well-established language learning exercises as found in standard language learning textbooks.
2021
pdf
bib
An Experiment on Implicitly Crowdsourcing Expert Knowledge about Romanian Synonyms from Language Learners
Lionel Nicolas
|
Lavinia Nicoleta Aparaschivei
|
Verena Lyding
|
Christos Rodosthenous
|
Federico Sangati
|
Alexander König
|
Corina Forascu
Proceedings of the 10th Workshop on NLP for Computer Assisted Language Learning
2020
pdf
bib
Substituto – A Synchronous Educational Language Game for Simultaneous Teaching and Crowdsourcing
Marianne Grace Araneta
|
Gülşen Eryiğit
|
Alexander König
|
Ji-Ung Lee
|
Ana Luís
|
Verena Lyding
|
Lionel Nicolas
|
Christos Rodosthenous
|
Federico Sangati
Proceedings of the 9th Workshop on NLP for Computer Assisted Language Learning
pdf
abs
Creating Expert Knowledge by Relying on Language Learners: a Generic Approach for Mass-Producing Language Resources by Combining Implicit Crowdsourcing and Language Learning
Lionel Nicolas
|
Verena Lyding
|
Claudia Borg
|
Corina Forascu
|
Karën Fort
|
Katerina Zdravkova
|
Iztok Kosem
|
Jaka Čibej
|
Špela Arhar Holdt
|
Alice Millour
|
Alexander König
|
Christos Rodosthenous
|
Federico Sangati
|
Umair ul Hassan
|
Anisia Katinskaia
|
Anabela Barreiro
|
Lavinia Aparaschivei
|
Yaakov HaCohen-Kerner
Proceedings of the Twelfth Language Resources and Evaluation Conference
We introduce in this paper a generic approach to combine implicit crowdsourcing and language learning in order to mass-produce language resources (LRs) for any language for which a crowd of language learners can be involved. We present the approach by explaining its core paradigm that consists in pairing specific types of LRs with specific exercises, by detailing both its strengths and challenges, and by discussing how much these challenges have been addressed at present. Accordingly, we also report on on-going proof-of-concept efforts aiming at developing the first prototypical implementation of the approach in order to correct and extend an LR called ConceptNet based on the input crowdsourced from language learners. We then present an international network called the European Network for Combining Language Learning with Crowdsourcing Techniques (enetCollect) that provides the context to accelerate the implementation of this generic approach. Finally, we exemplify how it can be used in several language learning scenarios to produce a multitude of NLP resources and how it can therefore alleviate the long-standing NLP issue of the lack of LRs.
pdf
abs
Using Crowdsourced Exercises for Vocabulary Training to Expand ConceptNet
Christos Rodosthenous
|
Verena Lyding
|
Federico Sangati
|
Alexander König
|
Umair ul Hassan
|
Lionel Nicolas
|
Jolita Horbacauskiene
|
Anisia Katinskaia
|
Lavinia Aparaschivei
Proceedings of the Twelfth Language Resources and Evaluation Conference
In this work, we report on a crowdsourcing experiment conducted using the V-TREL vocabulary trainer which is accessed via a Telegram chatbot interface to gather knowledge on word relations suitable for expanding ConceptNet. V-TREL is built on top of a generic architecture implementing the implicit crowdsourding paradigm in order to offer vocabulary training exercises generated from the commonsense knowledge-base ConceptNet and – in the background – to collect and evaluate the learners’ answers to extend ConceptNet with new words. In the experiment about 90 university students learning English at C1 level, based on Common European Framework of Reference for Languages (CEFR), trained their vocabulary with V-TREL over a period of 16 calendar days. The experiment allowed to gather more than 12,000 answers from learners on different question types. In this paper we present in detail the experimental setup and the outcome of the experiment, which indicates the potential of our approach for both crowdsourcing data as well as fostering vocabulary skills.
pdf
abs
Digital Language Infrastructures – Documenting Language Actors
Verena Lyding
|
Alexander König
|
Monica Pretti
Proceedings of the Twelfth Language Resources and Evaluation Conference
The major European language infrastructure initiatives like CLARIN (Hinrichs and Krauwer, 2014), DARIAH (Edmond et al., 2017) or Europeana (Europeana Foundation, 2015) have been built by focusing in the first place on institutions of larger scale, like specialized research departments and larger official units like national libraries, etc. However, besides these principal players also a large number of smaller language actors could contribute to and benefit from language infrastructures. Especially since these smaller institutions, like local libraries, archives and publishers, often collect, manage and host language resources of particular value for their geographical and cultural region, it seems highly relevant to find ways of engaging and connecting them to existing European infrastructure initiatives. In this article, we first highlight the need for reaching out to smaller local language actors and discuss challenges related to this ambition. Then we present the first step in how this objective was approached within a local language infrastructure project, namely by means of a structured documentation of the local language actors landscape in South Tyrol. We describe how the documentation efforts were structured and organized, and what tool we have set up to distribute the collected data online, by adapting existing CLARIN solutions.
2019
pdf
abs
v-trel: Vocabulary Trainer for Tracing Word Relations - An Implicit Crowdsourcing Approach
Verena Lyding
|
Christos Rodosthenous
|
Federico Sangati
|
Umair ul Hassan
|
Lionel Nicolas
|
Alexander König
|
Jolita Horbacauskiene
|
Anisia Katinskaia
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
In this paper, we present our work on developing a vocabulary trainer that uses exercises generated from language resources such as ConceptNet and crowdsources the responses of the learners to enrich the language resource. We performed an empirical evaluation of our approach with 60 non-native speakers over two days, which shows that new entries to expand Concept-Net can efficiently be gathered through vocabulary exercises on word relations. We also report on the feedback gathered from the users and an expert from language teaching, and discuss the potential of the vocabulary trainer application from the user and language learner perspective. The feedback suggests that v-trel has educational potential, while in its current state some shortcomings could be identified.
2018
pdf
Transc&Anno: A Graphical Tool for the Transcription and On-the-Fly Annotation of Handwritten Documents
Nadezda Okinina
|
Lionel Nicolas
|
Verena Lyding
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2016
pdf
abs
Design and Development of the MERLIN Learner Corpus Platform
Verena Lyding
|
Karin Schöne
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
In this paper, we report on the design and development of an online search platform for the MERLIN corpus of learner texts in Czech, German and Italian. It was created in the context of the MERLIN project, which aims at empirically illustrating features of the Common European Framework of Reference (CEFR) for evaluating language competences based on authentic learner text productions compiled into a learner corpus. Furthermore, the project aims at providing access to the corpus through a search interface adapted to the needs of multifaceted target groups involved with language learning and teaching. This article starts by providing a brief overview on the project ambition, the data resource and its intended target groups. Subsequently, the main focus of the article is on the design and development process of the platform, which is carried out in a user-centred fashion. The paper presents the user studies carried out to collect requirements, details the resulting decisions concerning the platform design and its implementation, and reports on the evaluation of the platform prototype and final adjustments.
2014
pdf
The PAISÀ Corpus of Italian Web Texts
Verena Lyding
|
Egon Stemle
|
Claudia Borghetti
|
Marco Brunello
|
Sara Castagnoli
|
Felice Dell’Orletta
|
Henrik Dittmann
|
Alessandro Lenci
|
Vito Pirrelli
Proceedings of the 9th Web as Corpus Workshop (WaC-9)
pdf
abs
‘interHist’ - an interactive visual interface for corpus exploration
Verena Lyding
|
Lionel Nicolas
|
Egon Stemle
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this article, we present interHist, a compact visualization for the interactive exploration of results to complex corpus queries. Integrated with a search interface to the PAISA corpus of Italian web texts, interHist aims at facilitating the exploration of large results sets to linguistic corpus searches. This objective is approached by providing an interactive visual overview of the data, which supports the user-steered navigation by means of interactive filtering. It allows to dynamically switch between an overview on the data and a detailed view on results in their immediate textual context, thus helping to detect and inspect relevant hits more efficiently. We provide background information on corpus linguistics and related work on visualizations for language and linguistic data. We introduce the architecture of interHist, by detailing the data structure it relies on, describing the visualization design and providing technical details of the implementation and its integration with the corpus querying environment. Finally, we illustrate its usage by presenting a use case for the analysis of the composition of Italian noun phrases.
2013
pdf
High-Accuracy Phrase Translation Acquisition Through Battle-Royale Selection
Lionel Nicolas
|
Egon W. Stemle
|
Klara Kranebitter
|
Verena Lyding
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013
2012
pdf
Visualising Linguistic Evolution in Academic Discourse
Verena Lyding
|
Ekaterina Lapshinova-Koltunski
|
Stefania Degaetano-Ortlieb
|
Henrik Dittmann
|
Chris Culy
Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH
2006
pdf
The LexALP Information System: Term Bank and Corpus for Multilingual Legal Terminology Consolidated
Verena Lyding
|
Elena Chiocchetti
|
Gilles Sérasset
|
Francis Brunet-Manquat
Proceedings of the Workshop on Multilingual Language Resources and Interoperability