2020
pdf
bib
abs
Constructing Multimodal Language Learner Texts Using LARA: Experiences with Nine Languages
Elham Akhlaghi
|
Branislav Bédi
|
Fatih Bektaş
|
Harald Berthelsen
|
Matthias Butterweck
|
Cathy Chua
|
Catia Cucchiarin
|
Gülşen Eryiğit
|
Johanna Gerlach
|
Hanieh Habibi
|
Neasa Ní Chiaráin
|
Manny Rayner
|
Steinþór Steingrímsson
|
Helmer Strik
Proceedings of the 12th Language Resources and Evaluation Conference
LARA (Learning and Reading Assistant) is an open source platform whose purpose is to support easy conversion of plain texts into multimodal online versions suitable for use by language learners. This involves semi-automatically tagging the text, adding other annotations and recording audio. The platform is suitable for creating texts in multiple languages via crowdsourcing techniques that can be used for teaching a language via reading and listening. We present results of initial experiments by various collaborators where we measure the time required to produce substantial LARA resources, up to the length of short novels, in Dutch, English, Farsi, French, German, Icelandic, Irish, Swedish and Turkish. The first results are encouraging. Although there are some startup problems, the conversion task seems manageable for the languages tested so far. The resulting enriched texts are posted online and are freely available in both source and compiled form.
pdf
bib
abs
BLISS: An Agent for Collecting Spoken Dialogue Data about Health and Well-being
Jelte van Waterschoot
|
Iris Hendrickx
|
Arif Khan
|
Esther Klabbers
|
Marcel de Korte
|
Helmer Strik
|
Catia Cucchiarini
|
Mariët Theune
Proceedings of the 12th Language Resources and Evaluation Conference
An important objective in health-technology is the ability to gather information about people’s well-being. Structured interviews can be used to obtain this information, but are time-consuming and not scalable. Questionnaires provide an alternative way to extract such information, though typically lack depth. In this paper, we present our first prototype of the BLISS agent, an artificial intelligent agent which intends to automatically discover what makes people happy and healthy. The goal of Behaviour-based Language-Interactive Speaking Systems (BLISS) is to understand the motivations behind people’s happiness by conducting a personalized spoken dialogue based on a happiness model. We built our first prototype of the model to collect 55 spoken dialogues, in which the BLISS agent asked questions to users about their happiness and well-being. Apart from a description of the BLISS architecture, we also provide details about our dataset, which contains over 120 activities and 100 motivations and is made available for usage.
pdf
bib
abs
Dedicated Language Resources for Interdisciplinary Research on Multiword Expressions: Best Thing since Sliced Bread
Ferdy Hubers
|
Catia Cucchiarini
|
Helmer Strik
Proceedings of the 12th Language Resources and Evaluation Conference
Multiword expressions such as idioms (beat about the bush), collocations (plastic surgery) and lexical bundles (in the middle of) are challenging for disciplines like Natural Language Processing (NLP), psycholinguistics and second language acquisition, , due to their more or less fixed character. Idiomatic expressions are especially problematic, because they convey a figurative meaning that cannot always be inferred from the literal meanings of the component words. Researchers acknowledge that important properties that characterize idioms such as frequency of exposure, familiarity, transparency, and imageability, should be taken into account in research, but these are typically properties that rely on subjective judgments. This is probably one of the reasons why many studies that investigated idiomatic expressions collected limited information about idiom properties for very small numbers of idioms only. In this paper we report on cross-boundary work aimed at developing a set of tools and language resources that are considered crucial for this kind of multifaceted research. We discuss the results of our research and suggest possible avenues for future research
2016
pdf
bib
abs
A Shared Task for Spoken CALL?
Claudia Baur
|
Johanna Gerlach
|
Manny Rayner
|
Martin Russell
|
Helmer Strik
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
We argue that the field of spoken CALL needs a shared task in order to facilitate comparisons between different groups and methodologies, and describe a concrete example of such a task, based on data collected from a speech-enabled online tool which has been used to help young Swiss German teens practise skills in English conversation. Items are prompt-response pairs, where the prompt is a piece of German text and the response is a recorded English audio file. The task is to label pairs as “accept” or “reject”, accepting responses which are grammatically and linguistically correct to match a set of hidden gold standard answers as closely as possible. Initial resources are provided so that a scratch system can be constructed with a minimal investment of effort, and in particular without necessarily using a speech recogniser. Training data for the task will be released in June 2016, and test data in January 2017.
pdf
bib
abs
A Dutch Dysarthric Speech Database for Individualized Speech Therapy Research
Emre Yilmaz
|
Mario Ganzeboom
|
Lilian Beijer
|
Catia Cucchiarini
|
Helmer Strik
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
We present a new Dutch dysarthric speech database containing utterances of neurological patients with Parkinson’s disease, traumatic brain injury and cerebrovascular accident. The speech content is phonetically and linguistically diversified by using numerous structured sentence and word lists. Containing more than 6 hours of mildly to moderately dysarthric speech, this database can be used for research on dysarthria and for developing and testing speech-to-text systems designed for medical applications. Current activities aimed at extending this database are also discussed.
2014
pdf
bib
abs
ASR-based CALL systems and learner speech data: new resources and opportunities for research and development in second language learning
Catia Cucchiarini
|
Steve Bodnar
|
Bart Penning de Vries
|
Roeland van Hout
|
Helmer Strik
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
In this paper we describe the language resources developed within the project Feedback and the Acquisition of Syntax in Oral Proficiency (FASOP), which is aimed at investigating the effectiveness of various forms of practice and feedback on the acquisition of syntax in second language (L2) oral proficiency, as well as their interplay with learner characteristics such as education level, learner motivation and confidence. For this purpose, use is made of a Computer Assisted Language Learning (CALL) system that employs Automatic Speech Recognition (ASR) technology to allow spoken interaction and to create an experimental environment that guarantees as much control over the language learning setting as possible. The focus of the present paper is on the resources that are being produced in FASOP. In line with the theme of this conference, we present the different types of resources developed within this project and the way in which these could be used to pursue innovative research in second language acquisition and to develop and improve ASR-based language learning applications.
2012
pdf
bib
The effect of domain and text type on text prediction quality
Suzan Verberne
|
Antal van den Bosch
|
Helmer Strik
|
Lou Boves
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
pdf
bib
abs
The DISCO ASR-based CALL system: practicing L2 oral skills and beyond
Helmer Strik
|
Jozef Colpaert
|
Joost van Doremalen
|
Catia Cucchiarini
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
In this paper we describe the research that was carried out and the resources that were developed within the DISCO (Development and Integration of Speech technology into COurseware for language learning) project. This project aimed at developing an ASR-based CALL system that automatically detects pronunciation and grammar errors in Dutch L2 speaking and generates appropriate, detailed feedback on the errors detected. We briefly introduce the DISCO system and present its design, architecture and speech recognition modules. We then describe a first evaluation of the complete DISCO system and present some results. The resources generated through DISCO are subsequently described together with possible ways of efficiently generating additional resources in the future.
2010
pdf
bib
abs
Human Language Technology and Communicative Disabilities: Requirements and Possibilities for the Future
Marina B. Ruiter
|
Toni C. M. Rietveld
|
Catia Cucchiarini
|
Emiel J. Krahmer
|
Helmer Strik
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
For some years now, the Nederlandse Taalunie (Dutch Language Union) has been active in promoting the development of human language technology (HLT) applications for users of Dutch with communication disabilities. The reason is that HLT products and services may enable these users to improve their verbal autonomy and communication skills. We sought to identify a minimum common set of HLT resources that is required to develop tools for a wide range of communication disabilities. In order to reach this goal, we investigated the specific HLT needs of communicatively disabled people and related these needs to the underlying HLT software components. By analysing the availability and quality of these essential HLT resources, we were able to identify which of the crucial elements need further research and development to become usable for developing applications for communicatively disabled users of Dutch. The results obtained in the current survey can be used to inform policy institutions on how they can stimulate the development of HLT resources for this target group. In the current study results were obtained for Dutch, but a similar approach can also be used for other languages.
2004
pdf
bib
abs
On the Usefulness of Large Spoken Language Corpora for Linguistic Research
Christophe Van Bael
|
Helmer Strik
|
Henk van den Heuvel
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
In the past, fundamental linguistic research was typically conducted on small data sets that were handcrafted for the specific research at hand. However, from the eighties onwards, many large spoken language corpora have become available. This study investigates the usefulness of large multi-purpose spoken language corpora for fundamental linguistic research. A research task was designed in which we tried to capture the major pronunciation differences between three speech styles in context-sensitive re-write rules at the phone level. These re-write rules were extracted from the alignments of both a manual phonetic transcription and an automatic phonetic transcription with a canonical reference transcription of the same material.
pdf
bib
Improving Automatic Phonetic Transcription of Spontaneous Speech Through Variant-Based Pronunciation Variation Modelling
Diana Binnenpoorte
|
Catia Cucchiarini
|
Helmer Strik
|
Lou Boves
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
2002
pdf
bib
A Field Survey for Establishing Priorities in the Development of HLT Resources for Dutch
D. Binnenpoorte
|
F. De Vriend
|
J. Sturm
|
W. Daelemans
|
H. Strik
|
C. Cucchiarini
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)