Nikos Tsourakis

2022

A popular idea in Computer Assisted Language Learning (CALL) is to use multimodal annotated texts, with annotations typically including embedded audio and translations, to support L2 learning through reading. An important question is how to create good quality audio, which can be done either through human recording or by a Text-To-Speech (TTS) engine. We may reasonably expect TTS to be quicker and easier, but human to be of higher quality. Here, we report a study using the open source LARA platform and ten languages. Samples of audio totalling about five minutes, representing the same four passages taken from LARA versions of Saint-Exupèry’s “Le petit prince”, were provided for each language in both human and TTS form; the passages were chosen to instantiate the 2x2 cross product of the conditions dialogue, not-dialogue and humour, not-humour. 251 subjects used a web form to compare human and TTS versions of each item and rate the voices as a whole. For the three languages where TTS did best, English, French and Irish, the evidence from this study and the previous one it extended suggest that TTS audio is now pedagogically adequate and roughly comparable with a non-professional human voice in terms of exemplifying correct pronunciation and prosody. It was however still judged substantially less natural and less pleasant to listen to. No clear evidence was found to support the hypothesis that dialogue and humour pose special problems for TTS. All data and software will be made freely available.

2021

pdf abs
A Speech-enabled Fixed-phrase Translator for Healthcare Accessibility
Pierrette Bouillon | Johanna Gerlach | Jonathan Mutal | Nikos Tsourakis | Hervé Spechbach
Proceedings of the 1st Workshop on NLP for Positive Impact

In this overview article we describe an application designed to enable communication between health practitioners and patients who do not share a common language, in situations where professional interpreters are not available. Built on the principle of a fixed phrase translator, the application implements different natural language processing (NLP) technologies, such as speech recognition, neural machine translation and text-to-speech to improve usability. Its design allows easy portability to new domains and integration of different types of output for multiple target audiences. Even though BabelDr is far from solving the problem of miscommunication between patients and doctors, it is a clear example of NLP in a real world application designed to help minority groups to communicate in a medical context. It also gives some insights into the relevant criteria for the development of such an application.

2016

pdf
An Open Web Platform for Rule-Based Speech-to-Sign Translation
Manny Rayner | Pierrette Bouillon | Sarah Ebling | Johanna Gerlach | Irene Strasly | Nikos Tsourakis
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2014

pdf abs
Using a Serious Game to Collect a Child Learner Speech Corpus
Claudia Baur | Manny Rayner | Nikos Tsourakis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present an English-L2 child learner speech corpus, produced by 14 year old Swiss German-L1 students in their third year of learning English, which is currently in the process of being collected. The collection method uses a web-enabled multimodal language game implemented using the CALL-SLT platform, in which subjects hold prompted conversations with an animated agent. Prompts consist of a short animated Engligh-language video clip together with a German-language piece of text indicating the semantic content of the requested response. Grammar-based speech understanding is used to decide whether responses are accepted or rejected, and dialogue flow is controlled using a simple XML-based scripting language; the scripts are written to allow multiple dialogue paths, the choice being made randomly. The system is gamified using a score-and-badge framework with four levels of badges. We describe the application, the data collection and annotation procedures, and the initial tranche of data. The full corpus, when complete, should contain at least 5,000 annotated utterances.

pdf
A tool for building multilingual voice questionnaires
Alejandro Armando | Pierrette Bouillon | Manny Rayner | Nikos Tsourakis
Proceedings of Translating and the Computer 36

2012

pdf abs
A Scalable Architecture For Web Deployment of Spoken Dialogue Systems
Matthew Fuchs | Nikos Tsourakis | Manny Rayner
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We describe a scalable architecture, particularly well-suited to cloud-based computing, which can be used for Web-deployment of spoken dialogue systems. In common with similar platforms, like WAMI and the Nuance Mobile Developer Platform, we use a client/server approach in which speech recognition is carried out on the server side; our architecture, however, differs from these systems in offering considerably more elaborate server-side functionality, based on large-scale grammar-based language processing and generic dialogue management. We describe two substantial applications, built using our framework, which we argue would have been hard to construct in WAMI or NMDP. Finally, we present a series of evaluations carried out using CALL-SLT, a speech translation game, where we contrast performance in Web and desktop versions. Task Error Rate in the Web version is only slightly inferior that in the desktop one, and the average additional latency is under half a second. The software is generally available for research purposes.

pdf abs
A Corpus for a Gesture-Controlled Mobile Spoken Dialogue System
Nikos Tsourakis | Manny Rayner
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Speech and hand gestures offer the most natural modalities for everyday human-to-human interaction. The availability of diverse spoken dialogue applications and the proliferation of accelerometers on consumer electronics allow the introduction of new interaction paradigms based on speech and gestures. Little attention has been paid however to the manipulation of spoken dialogue systems through gestures. Situation-induced disabilities or real disabilities are determinant factors that motivate this type of interaction. In this paper we propose six concise and intuitively meaningful gestures that can be used to trigger the commands in any SDS. Using different machine learning techniques we achieve a classification error for the gesture patterns of less than 5%, and we also compare our own set of gestures to ones proposed by users. Finally, we examine the social acceptability of the specific interaction scheme and encounter high levels of acceptance for public use.

2010

pdf abs
Examining the Effects of Rephrasing User Input on Two Mobile Spoken Language Systems
Nikos Tsourakis | Agnes Lisowska | Manny Rayner | Pierrette Bouillon
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

During the construction of a spoken dialogue system much effort is spent on improving the quality of speech recognition as possible. However, even if an application perfectly recognizes the input, its understanding may be far from what the user originally meant. The user should be informed about what the system actually understood so that an error will not have a negative impact in the later stages of the dialogue. One important aspect that this work tries to address is the effect of presenting the systems understanding during interaction with users. We argue that for specific kinds of applications its important to confirm the understanding of the system before obtaining the output. In this way the user can avoid misconceptions and problems occurring in the dialogue flow and he can enhance his confidence in the system. Nevertheless this has an impact on the interaction, as the mental workload increases, and the users behavior may adapt to the systems coverage. We focus on two applications that implement the notion of rephrasing users input in a different way. Our study took place among 14 subjects that used both systems on a Nokia N810 Internet Tablet.

We describe a multilingual Open Source CALL game, CALL-SLT, which reuses speech translation technology developed using the Regulus platform to create an automatic conversation partner that allows intermediate-level language students to improve their fluency. We contrast CALL-SLT with Wang's and Seneff's ``translation game'' system, in particular focussing on three issues. First, we argue that the grammar-based recognition architecture offered by Regulus is more suitable for this type of application; second, that it is preferable to prompt the student in a language-neutral form, rather than in the L1; and third, that we can profitably record successful interactions by native speakers and store them to be reused as online help for students. The current system, which will be demoed at the conference, supports four L2s (English, French, Japanese and Swedish) and two L1s (English and French). We conclude by describing an evaluation exercise, where a version of CALL-SLT configured for English L2 and French L1 was used by several hundred high school students. About half of the subjects reported positive impressions of the system.

2008

We describe recent work on MedSLT, a medium-vocabulary interlingua-based medical speech translation system, focussing on issues that arise when handling languages of which the grammar engineer has little or no knowledge. We show how we can systematically create and maintain multiple forms of grammars, lexica and interlingual representations, with some versions being used by language informants, and some by grammar engineers. In particular, we describe the advantages of structuring the interlingua definition as a simple semantic grammar, which includes a human-readable surface form. We show how this allows us to rationalise the process of evaluating translations between languages lacking common speakers, and also makes it possible to create a simple generic tool for debugging to-interlingua translation rules. Examples presented focus on the concrete case of translation between Japanese and Arabic in both directions.

pdf abs
Building Mobile Spoken Dialogue Applications Using Regulus
Nikos Tsourakis | Maria Georgescul | Pierrette Bouillon | Manny Rayner
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Regulus is an Open Source platform that supports construction of rule-based medium-vocabulary spoken dialogue applications. It has already been used to build several substantial speech-enabled applications, including NASAs Clarissa procedure navigator and Geneva Universitys MedSLT medical speech translator. System like these would be far more useful if they were available on a hand-held device, rather than, as with the present version, on a laptop. In this paper we describe the Open Source framework we have developed, which makes it possible to run Regulus applications on generally available mobile devices, using a distributed client-server architecture that offers transparent and reliable integration with different types of ASR systems. We describe the architecture, an implemented calendar application prototype hosted on a mobile device, and an evaluation. The evaluation shows that performance on the mobile device is as good as performance on a normal desktop PC.

pdf
Comparing two different bidirectional versions of the limited-domain medical spoken language translator MedSLT
Marianne Starlander | Pierrette Bouillon | Glenn Flores | Manny Rayner | Nikos Tsourakis
Proceedings of the 12th Annual conference of the European Association for Machine Translation

pdf abs
Many-to-Many Multilingual Medical Speech Translation on a PDA
Kyoko Kanzaki | Yukie Nakao | Manny Rayner | Marianne Santaholma | Marianne Starlander | Nikos Tsourakis
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT

Particularly considering the requirement of high reliability, we argue that the most appropriate architecture for a medical speech translator that can be realised using today’s technology combines unidirectional (doctor to patient) translation, medium-vocabulary controlled language coverage, interlingua-based translation, an embedded help component, and deployability on a hand-held hardware platform. We present an overview of the Open Source MedSLT prototype, which has been developed in accordance with these design principles. The system is implemented on top of the Regulus and Nuance 8.5 platforms, translates patient examination questions for all language pairs in the set {English, French, Japanese, Arabic, Catalan}, using vocabularies of about 400 to 1 100 words, and can be run in a distributed client/server environment, where the client application is hosted on a Nokia Internet Tablet device.

2007

Venues

lrec8
ws2
acl1
eamt1
nlp4posimpact1
show all...

amta1

tc1