2023
pdf
Validation of Language Agnostic Models for Discourse Marker Detection
Mariana Damova
|
Kostadin Mishev
|
Giedrė Valūnaitė-Oleškevičienė
|
Chaya Liebeskind
|
Purificação Silvano
|
Dimitar Trajanov
|
Ciprian-Octavian Truica
|
Elena-Simona Apostol
|
Christian Chiarcos
|
Anna Baczkowska
Proceedings of the 4th Conference on Language, Data and Knowledge
2022
pdf
abs
ISO-based Annotated Multilingual Parallel Corpus for Discourse Markers
Purificação Silvano
|
Mariana Damova
|
Giedrė Valūnaitė Oleškevičienė
|
Chaya Liebeskind
|
Christian Chiarcos
|
Dimitar Trajanov
|
Ciprian-Octavian Truică
|
Elena-Simona Apostol
|
Anna Baczkowska
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Discourse markers carry information about the discourse structure and organization, and also signal local dependencies or epistemological stance of speaker. They provide instructions on how to interpret the discourse, and their study is paramount to understand the mechanism underlying discourse organization. This paper presents a new language resource, an ISO-based annotated multilingual parallel corpus for discourse markers. The corpus comprises nine languages, Bulgarian, Lithuanian, German, European Portuguese, Hebrew, Romanian, Polish, and Macedonian, with English as a pivot language. In order to represent the meaning of the discourse markers, we propose an annotation scheme of discourse relations from ISO 24617-8 with a plug-in to ISO 24617-2 for communicative functions. We describe an experiment in which we applied the annotation scheme to assess its validity. The results reveal that, although some extensions are required to cover all the multilingual data, it provides a proper representation of discourse markers value. Additionally, we report some relevant contrastive phenomena concerning discourse markers interpretation and role in discourse. This first step will allow us to develop deep learning methods to identify and extract discourse relations and communicative functions, and to represent that information as Linguistic Linked Open Data (LLOD).
pdf
abs
Using the LARA Little Prince to compare human and TTS audio quality
Elham Akhlaghi
|
Ingibjörg Iða Auðunardóttir
|
Anna Bączkowska
|
Branislav Bédi
|
Hakeem Beedar
|
Harald Berthelsen
|
Cathy Chua
|
Catia Cucchiarin
|
Hanieh Habibi
|
Ivana Horváthová
|
Junta Ikeda
|
Christèle Maizonniaux
|
Neasa Ní Chiaráin
|
Chadi Raheb
|
Manny Rayner
|
John Sloan
|
Nikos Tsourakis
|
Chunlin Yao
Proceedings of the Thirteenth Language Resources and Evaluation Conference
A popular idea in Computer Assisted Language Learning (CALL) is to use multimodal annotated texts, with annotations typically including embedded audio and translations, to support L2 learning through reading. An important question is how to create good quality audio, which can be done either through human recording or by a Text-To-Speech (TTS) engine. We may reasonably expect TTS to be quicker and easier, but human to be of higher quality. Here, we report a study using the open source LARA platform and ten languages. Samples of audio totalling about five minutes, representing the same four passages taken from LARA versions of Saint-Exupèry’s “Le petit prince”, were provided for each language in both human and TTS form; the passages were chosen to instantiate the 2x2 cross product of the conditions dialogue, not-dialogue and humour, not-humour. 251 subjects used a web form to compare human and TTS versions of each item and rate the voices as a whole. For the three languages where TTS did best, English, French and Irish, the evidence from this study and the previous one it extended suggest that TTS audio is now pedagogically adequate and roughly comparable with a non-professional human voice in terms of exemplifying correct pronunciation and prosody. It was however still judged substantially less natural and less pleasant to listen to. No clear evidence was found to support the hypothesis that dialogue and humour pose special problems for TTS. All data and software will be made freely available.