Manny Rayner

Also published as: M. Rayner


2024

In Drehu, a language of the indigenous Kanak people of New Caledonia, the word treu ‘moon’ is pronounced [{tSe.u}]; but, even if they hear the word, the spelling pulls French speakers to a spurious pronunciation [tK{o}]. We implement a strategy to mitigate the influence of such orthographic conflicts, while retaining the benefits of written input on vocabulary learning. We present text in “phonetized” form, where words are broken down into components associated with mnemonically presented phonetic values, adapting features from the “Comment ça se prononce~?” multilingual phonetizer. We present an exploratory project where we used the ChatGPT-based Learning And Reading Assistant (C-LARA) to implement a version of the phonetizer strategy, outlining how the AI-engineered codebase and help from the AI made it easy to add the necessary extensions. We describe two proof-of-concept texts for learners produced using the platform, a Drehu alphabet book and a Drehu version of “The (North) Wind and the Sun”; both texts include native-speaker recorded audio, pronunciation respellings based on French orthography, and AI-generated illustrations.

2023

We present a cross-linguistic study in which the open source C-LARA platform was used to evaluate GPT-4’s ability to perform several key tasks relevant to Computer Assisted Language Learning. For each of the languages English, Farsi, Faroese, Mandarin and Russian, we instructed GPT-4, through C-LARA, to write six different texts, using prompts chosen to obtain texts of widely differing character. We then further instructed GPT-4 to annotate each text with segmentation markup, glosses and lemma/part-of-speech information; native speakers hand-corrected the texts and annotations to obtain error rates on the different component tasks. The C-LARA platform makes it easy to combine the results into a single multimodal document, further facilitating checking of their correctness. GPT-4’s performance varied widely across languages and processing tasks, but performance on different text genres was roughly comparable. In some cases, most notably glossing of English text, we found that GPT-4 was consistently able to revise its annotations to improve them.

2022

We describe recent extensions to the open source Learning And Reading Assistant (LARA) supporting image-based and phonetically annotated texts. We motivate the utility of these extensions both in general and specifically in relation to endangered and archaic languages, and illustrate with examples from the revived Australian language Barngarla, Icelandic Sign Language, Irish Gaelic, Old Norse manuscripts and Egyptian hieroglyphics.
A popular idea in Computer Assisted Language Learning (CALL) is to use multimodal annotated texts, with annotations typically including embedded audio and translations, to support L2 learning through reading. An important question is how to create good quality audio, which can be done either through human recording or by a Text-To-Speech (TTS) engine. We may reasonably expect TTS to be quicker and easier, but human to be of higher quality. Here, we report a study using the open source LARA platform and ten languages. Samples of audio totalling about five minutes, representing the same four passages taken from LARA versions of Saint-Exupèry’s “Le petit prince”, were provided for each language in both human and TTS form; the passages were chosen to instantiate the 2x2 cross product of the conditions dialogue, not-dialogue and humour, not-humour. 251 subjects used a web form to compare human and TTS versions of each item and rate the voices as a whole. For the three languages where TTS did best, English, French and Irish, the evidence from this study and the previous one it extended suggest that TTS audio is now pedagogically adequate and roughly comparable with a non-professional human voice in terms of exemplifying correct pronunciation and prosody. It was however still judged substantially less natural and less pleasant to listen to. No clear evidence was found to support the hypothesis that dialogue and humour pose special problems for TTS. All data and software will be made freely available.
We present an overview of LARA, the Learning And Reading Assistant, an open source platform for easy creation and use of multimedia annotated texts designed to support the improvement of reading skills. The paper is divided into three parts. In the first, we give a brief summary of LARA’s processing. In the second, we describe some generic functionality specially relevant for reading assistance: support for phonetically annotated texts, support for image-based texts, and integrated production of text-to-speech (TTS) generated audio. In the third, we outline some of the larger projects so far carried out with LARA, involving development of content for learning second and foreign (L2) languages such as Icelandic, Farsi, Irish, Old Norse and the Australian Aboriginal language Barngarla, where the issues involved overlap with those that arise when trying to help students improve first-language (L1) reading skills. All software and almost all content is freely available.

2021

2020

LARA (Learning and Reading Assistant) is an open source platform whose purpose is to support easy conversion of plain texts into multimodal online versions suitable for use by language learners. This involves semi-automatically tagging the text, adding other annotations and recording audio. The platform is suitable for creating texts in multiple languages via crowdsourcing techniques that can be used for teaching a language via reading and listening. We present results of initial experiments by various collaborators where we measure the time required to produce substantial LARA resources, up to the length of short novels, in Dutch, English, Farsi, French, German, Icelandic, Irish, Swedish and Turkish. The first results are encouraging. Although there are some startup problems, the conversion task seems manageable for the languages tested so far. The resulting enriched texts are posted online and are freely available in both source and compiled form.

2016

We argue that the field of spoken CALL needs a shared task in order to facilitate comparisons between different groups and methodologies, and describe a concrete example of such a task, based on data collected from a speech-enabled online tool which has been used to help young Swiss German teens practise skills in English conversation. Items are prompt-response pairs, where the prompt is a piece of German text and the response is a recorded English audio file. The task is to label pairs as “accept” or “reject”, accepting responses which are grammatically and linguistically correct to match a set of hidden gold standard answers as closely as possible. Initial resources are provided so that a scratch system can be constructed with a minimal investment of effort, and in particular without necessarily using a speech recogniser. Training data for the task will be released in June 2016, and test data in January 2017.

2014

We describe CALL-SLT, a speech-enabled Computer-Assisted Language Learning application where the central idea is to prompt the student with an abstract representation of what they are supposed to say, and then use a combination of grammar-based speech recognition and rule-based translation to rate their response. The system has been developed to the level of a mature prototype, freely deployed on the web, with versions for several languages. We present an overview of the core system architecture and the various types of content we have developed. Finally, we describe several evaluations, the last of which is a study carried out over about a week using 130 subjects recruited through the Amazon Mechanical Turk, in which CALL-SLT was contrasted against a control version where the speech recognition component was disabled. The improvement in student learning performance between the two groups was significant at p < 0.02.
We present an English-L2 child learner speech corpus, produced by 14 year old Swiss German-L1 students in their third year of learning English, which is currently in the process of being collected. The collection method uses a web-enabled multimodal language game implemented using the CALL-SLT platform, in which subjects hold prompted conversations with an animated agent. Prompts consist of a short animated Engligh-language video clip together with a German-language piece of text indicating the semantic content of the requested response. Grammar-based speech understanding is used to decide whether responses are accepted or rejected, and dialogue flow is controlled using a simple XML-based scripting language; the scripts are written to allow multiple dialogue paths, the choice being made randomly. The system is gamified using a score-and-badge framework with four levels of badges. We describe the application, the data collection and annotation procedures, and the initial tranche of data. The full corpus, when complete, should contain at least 5,000 annotated utterances.

2013

2012

Mismatches between training and test data are a ubiquitous problem for real SMT applications. In this paper, we examine a type of mismatch that commonly arises when translating from French and similar languages: available training data is mostly formal register, but test data may well be informal register. We consider methods for defining surface transformations that map common informal language constructions into their formal language counterparts, or vice versa; we then describe two ways to use these mappings, either to create artificial training data or to pre-process source text at run-time. An initial evaluation performed using crowd-sourced comparisons of alternate translations produced by a French-to-English SMT system suggests that both methods can improve performance, with run-time pre-processing being the more effective of the two.
We describe an experiment carried out using a French version of CALL-SLT, a web-enabled CALL game in which students at each turn are prompted to give a semi-free spoken response which the system then either accepts or rejects. The central question we investigate is whether the response is appropriate; we do this by extracting pairs of utterances where both members of the pair are responses by the same student to the same prompt, and where one response is accepted and one rejected. When the two spoken responses are presented in random order, native speakers show a reasonable degree of agreement in judging that the accepted utterance is better than the rejected one. We discuss the significance of the results and also present a small study supporting the claim that native speakers are nearly always recognised by the system, while non-native speakers are rejected a significant proportion of the time.
We describe a scalable architecture, particularly well-suited to cloud-based computing, which can be used for Web-deployment of spoken dialogue systems. In common with similar platforms, like WAMI and the Nuance Mobile Developer Platform, we use a client/server approach in which speech recognition is carried out on the server side; our architecture, however, differs from these systems in offering considerably more elaborate server-side functionality, based on large-scale grammar-based language processing and generic dialogue management. We describe two substantial applications, built using our framework, which we argue would have been hard to construct in WAMI or NMDP. Finally, we present a series of evaluations carried out using CALL-SLT, a speech translation game, where we contrast performance in Web and desktop versions. Task Error Rate in the Web version is only slightly inferior that in the desktop one, and the average additional latency is under half a second. The software is generally available for research purposes.
Speech and hand gestures offer the most natural modalities for everyday human-to-human interaction. The availability of diverse spoken dialogue applications and the proliferation of accelerometers on consumer electronics allow the introduction of new interaction paradigms based on speech and gestures. Little attention has been paid however to the manipulation of spoken dialogue systems through gestures. Situation-induced disabilities or real disabilities are determinant factors that motivate this type of interaction. In this paper we propose six concise and intuitively meaningful gestures that can be used to trigger the commands in any SDS. Using different machine learning techniques we achieve a classification error for the gesture patterns of less than 5%, and we also compare our own set of gestures to ones proposed by users. Finally, we examine the social acceptability of the specific interaction scheme and encounter high levels of acceptance for public use.

2011

2010

During the construction of a spoken dialogue system much effort is spent on improving the quality of speech recognition as possible. However, even if an application perfectly recognizes the input, its understanding may be far from what the user originally meant. The user should be informed about what the system actually understood so that an error will not have a negative impact in the later stages of the dialogue. One important aspect that this work tries to address is the effect of presenting the system’s understanding during interaction with users. We argue that for specific kinds of applications it’s important to confirm the understanding of the system before obtaining the output. In this way the user can avoid misconceptions and problems occurring in the dialogue flow and he can enhance his confidence in the system. Nevertheless this has an impact on the interaction, as the mental workload increases, and the user’s behavior may adapt to the system’s coverage. We focus on two applications that implement the notion of rephrasing user’s input in a different way. Our study took place among 14 subjects that used both systems on a Nokia N810 Internet Tablet.
We describe a multilingual Open Source CALL game, CALL-SLT, which reuses speech translation technology developed using the Regulus platform to create an automatic conversation partner that allows intermediate-level language students to improve their fluency. We contrast CALL-SLT with Wang's and Seneff's ``translation game'' system, in particular focussing on three issues. First, we argue that the grammar-based recognition architecture offered by Regulus is more suitable for this type of application; second, that it is preferable to prompt the student in a language-neutral form, rather than in the L1; and third, that we can profitably record successful interactions by native speakers and store them to be reused as online help for students. The current system, which will be demoed at the conference, supports four L2s (English, French, Japanese and Swedish) and two L1s (English and French). We conclude by describing an evaluation exercise, where a version of CALL-SLT configured for English L2 and French L1 was used by several hundred high school students. About half of the subjects reported positive impressions of the system.

2009

2008

Particularly considering the requirement of high reliability, we argue that the most appropriate architecture for a medical speech translator that can be realised using today’s technology combines unidirectional (doctor to patient) translation, medium-vocabulary controlled language coverage, interlingua-based translation, an embedded help component, and deployability on a hand-held hardware platform. We present an overview of the Open Source MedSLT prototype, which has been developed in accordance with these design principles. The system is implemented on top of the Regulus and Nuance 8.5 platforms, translates patient examination questions for all language pairs in the set {English, French, Japanese, Arabic, Catalan}, using vocabularies of about 400 to 1 100 words, and can be run in a distributed client/server environment, where the client application is hosted on a Nokia Internet Tablet device.
We describe recent work on MedSLT, a medium-vocabulary interlingua-based medical speech translation system, focussing on issues that arise when handling languages of which the grammar engineer has little or no knowledge. We show how we can systematically create and maintain multiple forms of grammars, lexica and interlingual representations, with some versions being used by language informants, and some by grammar engineers. In particular, we describe the advantages of structuring the interlingua definition as a simple semantic grammar, which includes a human-readable surface form. We show how this allows us to rationalise the process of evaluating translations between languages lacking common speakers, and also makes it possible to create a simple generic tool for debugging to-interlingua translation rules. Examples presented focus on the concrete case of translation between Japanese and Arabic in both directions.
Regulus is an Open Source platform that supports construction of rule-based medium-vocabulary spoken dialogue applications. It has already been used to build several substantial speech-enabled applications, including NASA’s Clarissa procedure navigator and Geneva University’s MedSLT medical speech translator. System like these would be far more useful if they were available on a hand-held device, rather than, as with the present version, on a laptop. In this paper we describe the Open Source framework we have developed, which makes it possible to run Regulus applications on generally available mobile devices, using a distributed client-server architecture that offers transparent and reliable integration with different types of ASR systems. We describe the architecture, an implemented calendar application prototype hosted on a mobile device, and an evaluation. The evaluation shows that performance on the mobile device is as good as performance on a normal desktop PC.

2007

Dans tout dialogue, les phrases elliptiques sont très nombreuses. Dans cet article, nous évaluons leur impact sur la reconnaissance et la traduction dans le système de traduction automatique de la parole MedSLT. La résolution des ellipses y est effectuée par une méthode robuste et portable, empruntée aux systèmes de dialogue homme-machine. Cette dernière exploite une représentation sémantique plate et combine des techniques linguistiques (pour construire la représentation) et basées sur les exemples (pour apprendre sur la base d’un corpus ce qu’est une ellipse bien formée dans un sous-domaine donné et comment la résoudre).

2006

Aujourd’hui, l’approche la plus courante en traitement de la parole consiste à combiner un reconnaisseur statistique avec un analyseur robuste. Pour beaucoup d’applications cependant, les reconnaisseurs linguistiques basés sur les grammaires offrent de nombreux avantages. Dans cet article, nous présentons une méthodologie et un ensemble de logiciels libres (appelé Regulus) pour dériver rapidement des reconnaisseurs linguistiquement motivés à partir d’une grammaire générale partagée pour le catalan et le français.
We present an overview of Regulus, an Open Source platform that supports corpus-based derivation of efficient domain-specific speech recognisers from general linguistically motivated unification grammars. We list available Open Source resources, which include compilers, resource grammars for various languages, documentation and a development environment. The greater part of the paper presents a series of experiments carried out using a medium-vocabulary medical speech translation application and a corpus of 801 recorded domain utterances, designed to investigate the impact on speech understanding performance of vocabulary size, grammatical coverage, presence or absence of various linguistic features, degree of generality of thegrammar and use or otherwise of probabilistic weighting in the CFGlanguage model. In terms of task accuracy, the most significant factors were the use of probabilistic weighting, the degree of generality of the grammar and the inclusion of features which model sortal restrictions.

2005

We present an overview of MedSLT, a medium-vocabulary medical speech translation system, focussing on the representational issues that arise when translating temporal and causal concepts. Although flat key/value structures are strongly preferred as semantic representations in speech understanding systems, we argue that it is infeasible to handle the necessary range of concepts using only flat structures. By exploiting the specific nature of the task, we show that it is possible to implement a solution which only slightly extends the representational complexity of the semantic representation language, by permitting an optional single nested level representing a subordinate clause construct. We sketch our solutions to the key problems of producing minimally nested representations using phrase-spotting methods, and writing cleanly structured rule-sets that map temporal and phrasal representations into a canonical interlingual form.
In this paper, we present evidence that providing users of a speech to speech translation system for emergency diagnosis (MedSLT) with a tool that helps them to learn the coverage greatly improves their success in using the system. In MedSLT, the system uses a grammar-based recogniser that provides more predictable results to the translation component. The help module aims at addressing the lack of robustness inherent in this type of approach. It takes as input the result of a robust statistical recogniser that performs better for out-of-coverage data and produces a list of in-coverage example sentences. These examples are selected from a defined list using a heuristic that prioritises sentences maximising the number of N-grams shared with those extracted from the recognition result.

2004

2003

2002

2001

2000

1997

1996

1995

1994

1993

1992

1991

1990

1988

Search
Co-authors
Fix author