This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
BrigitteBigi
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
La Langue française Parlée Complétée (LfPC) est un système de communication développé pour les personnes sourdes afin de compléter la lecture labiale avec une main, au niveau phonétique. Il est utilisé par les enfants pour acquérir des compétences en lecture, en lecture labiale et en communication orale. L’objectif principal est de permettre aux enfants sourds de devenir des lecteurs et des locuteurs compétents en langue française. Nous proposons une preuve de concept (PoC) d’un système de réalité augmentée qui place automatiquement la représentation d’une main codeuse sur la vidéo pré-enregistrée d’un locuteur. Le PoC prédit la forme et la position de la main, le moment durant lequel elle doit être affichée, et ses coordonnées relativement au visage dans la vidéo. Des photos de mains sont ensuite juxtaposées à la vidéo. Des vidéos annotées automatiquement par le PoC ont été montrées à des personnes sourdes qui l’ont accueilli et évalué favorablement.
In this position paper we present a methodology to automatically annotate French text for Cued Speech (CS), a communication system developed for people with hearing loss to complement speech reading at the phonetic level. This visual communication mode uses handshapes in different placements near the face in combination with the mouth movements (called ‘cues’ or ‘keys’) to make the phonemes of spoken language look different from each other. CS is used to acquire skills in lip reading, in oral communication and for reading. Despite many studies demonstrating its benefits, there are few resources available for learning and practicing it, especially in French. We thus propose a methodology to phonemize written corpora so that each word is aligned with the corresponding CS key(s). This methodology is proposed as part of a wider project aimed at creating an augmented reality system displaying a virtual coding hand where the user will be able to choose a text upon its complexity for cueing.
Cued Speech is a communication system developed for deaf people to complement speechreading at the phonetic level with hands. This visual communication mode uses handshapes in different placements near the face in combination with the mouth movements of speech to make the phonemes of spoken language look different from each other. This paper describes CLeLfPC - Corpus de Lecture en Langue française Parlée Complétée, a corpus of French Cued Speech. It consists in about 4 hours of audio and HD video recordings of 23 participants. The recordings are 160 different isolated ‘CV’ syllables repeated 5 times, 320 words or phrases repeated 2-3 times and about 350 sentences repeated 2-3 times. The corpus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. It can be used for any further research or teaching purpose. The corpus includes orthographic transliteration and other phonetic annotations on 5 of the recorded topics, i.e. syllables, words, isolated sentences and a text. The early results are encouraging: it seems that 1/ the hand position has a high influence on the key audio duration; and 2/ the hand shape has not.
L’objectif de ce travail est de quantifier les positions articulatoires théoriques lors de la production de la parole spontanée dans trois langues. Chaque langue dispose d’un inventaire phonologique spécifique. Mais ces spécificités ne sont pas représentées telles quelles en parole spontanée dans laquelle les phonèmes n’ont pas tous la même fréquence d’apparition. Nous avons comparé trois langues (polonais, français et anglais américain) présentant des différences notables dans leur inventaire phonologique. Des positions articulatoires ont été calculées sur la base des fréquences des phonèmes dans chacune des trois langues dans des corpus de parole spontanée. Etonnamment, les résultats tendent à montrer que les positions articulatoires majoritaires sont très similaires dans les trois langues. Il semble ainsi que l’usage de la parole spontanée, et donc la distribution des phonèmes dans les langues, gomme les disparités des systèmes phonologiques pour tendre vers une mobilisation articulatoire commune. Des investigations plus approfondies devront vérifier cette observation.
Cheese! is a conversational corpus. It consists of 11 French face-to-face conversations lasting around 15 minutes each. Cheese! is a duplication of an American corpus (ref) in order to conduct a cross-cultural comparison of participants’ smiling behavior in humorous and non-humorous sequences in American English and French conversations. In this article, the methodology used to collect and enrich the corpus is presented: experimental protocol, technical choices, transcription, semi-automatic annotations, manual annotations of smiling and humor. An exploratory study investigating the links between smile and humor is then proposed. Based on the analysis of two interactions, two questions are asked: (1) Does smile frame humor? (2) Does smile has an impact on its success or failure? If the experimental design of Cheese! has been elaborated to study specifically smiles and humor in conversations, the high quality of the dataset obtained, and the methodology used are also replicable and can be applied to analyze many other conversational activities and other multimodal modalities.
In this paper we present investigation of real-life, bi-directional conversations. We introduce the multimodal corpus derived from these natural conversations alternating between human-human and human-robot interactions. The human-robot interactions were used as a control condition for the social nature of the human-human conversations. The experimental set up consisted of conversations between the participant in a functional magnetic resonance imaging (fMRI) scanner and a human confederate or conversational robot outside the scanner room, connected via bidirectional audio and unidirectional videoconferencing (from the outside to inside the scanner). A cover story provided a framework for natural, real-life conversations about images of an advertisement campaign. During the conversations we collected a multimodal corpus for a comprehensive characterization of bi-directional conversations. In this paper we introduce this multimodal corpus which includes neural data from functional magnetic resonance imaging (fMRI), physiological data (blood flow pulse and respiration), transcribed conversational data, as well as face and eye-tracking recordings. Thus, we present a unique corpus to study human conversations including neural, physiological and behavioral data.
The analysis of the structure of speech nearly always rests on the alignment of the speech recording with a phonetic transcription. Nowadays several tools can perform this speech segmentation automatically. However, none of them allows the automatic segmentation of Quebec French (QF hereafter), the acoustics and phonotactics of QF differing widely from that of France French (FF hereafter). To adequately segment QF, features like diphthongization of long vowels and affrication of coronal stops have to be taken into account. Thus acoustic models for automatic segmentation must be trained on speech samples exhibiting those phenomena. Dictionaries and lexicons must also be adapted and integrate differences in lexical units and in the phonology of QF. This paper presents the development of linguistic resources to be included into SPPAS software tool in order to get Text normalization, Phonetization, Alignment and Syllabification. We adapted the existing French lexicon and developed a QF-specific pronunciation dictionary. We then created an acoustic model from the existing ones and adapted it with 5 minutes of manually time-aligned data. These new resources are all freely distributed with SPPAS version 2.7; they perform the full process of speech segmentation in Quebec French.
Cette étude vise à mieux comprendre la répartition des réductions phonétiques présentes dans la production de parole. Nous avons sélectionné l! ensemble des phonèmes les plus courts (30ms) à partir de l! alignement d! un corpus de parole conversationnelle. Cette version contenant uniquement les phonèmes courts (V1) est comparée à la version contenant l! alignement de tous les phonèmes du corpus (V0). Les deux versions sont mises en relation avec l! annotation des mots et de leur catégorie syntaxique. Les résultats montrent que les liquides, les glissantes et les voyelles fermées sont plus représentées dans V1 que dans V0. Par ailleurs, la nature et la catégorie syntaxique des mots modulent la distribution des phonèmes en V1. Ainsi, la nature instable du /l/, ainsi que sa présence dans de très nombreux pronoms et déterminants, en fait le phonème le plus marqué par la réduction. Enfin, la fréquence des mots semble montrer des effets contradictoires.
This paper presents a quantitative description of laughter in height 1-hour French spontaneous conversations. The paper includes the raw figures for laughter as well as more details concerning inter-individual variability. It firstly describes to what extent the amount of laughter and their durations varies from speaker to speaker in all dialogs. In a second suite of analyses, this paper compares our corpus with previous analyzed corpora. In a final set of experiments, it presents some facts about overlapping laughs. This paper have quantified these all effects in free-style conversations, for the first time.
This paper presents the TYPALOC corpus of French Dysarthric and Healthy speech and the rationale underlying its constitution. The objective is to compare phonetic variation in the speech of dysarthric vs. healthy speakers in different speech conditions (read and unprepared speech). More precisely, we aim to compare the extent, types and location of phonetic variation within these different populations and speech conditions. The TYPALOC corpus is constituted of a selection of 28 dysarthric patients (three different pathologies) and of 12 healthy control speakers recorded while reading the same text and in a more natural continuous speech condition. Each audio signal has been segmented into Inter-Pausal Units. Then, the corpus has been manually transcribed and automatically aligned. The alignment has been corrected by an expert phonetician. Moreover, the corpus benefits from an automatic syllabification and an Automatic Detection of Acoustic Phone-Based Anomalies. Finally, in order to interpret phonetic variations due to pathologies, a perceptual evaluation of each patient has been conducted. Quantitative data are provided at the end of the paper.
The question of interoperability for linguistic annotated resources covers different aspects. First, it requires a representation framework making it possible to compare, and eventually merge, different annotation schema. In this paper, a general description level representing the multimodal linguistic annotations is proposed. It focuses on time representation and on the data content representation: This paper reconsiders and enhances the current and generalized representation of annotations. An XML schema of such annotations is proposed. A Python API is also proposed. This framework is implemented in a multi-platform software and distributed under the terms of the GNU Public License.
This paper investigates the discursive phenomenon called other-repetitions (OR), particularly in the context of spontaneous French dialogues. It focuses on their automatic detection and characterization. A method is proposed to retrieve automatically OR: this detection is based on rules that are applied on the lexical material only. This automatic detection process has been used to label other-repetitions on 8 dialogues of CID - Corpus of Interactional Data. Evaluations performed on one speaker are good with a F1-measure of 0.85. Retrieved OR occurrences are then statistically described: number of words, distance, etc.
This paper introduces the Aix Map Task corpus, a corpus of audio and video recordings of task-oriented dialogues. It was modelled after the original HCRC Map Task corpus. Lexical material was designed for the analysis of speech and prosody, as described in Astésano et al. (2007). The design of the lexical material, the protocol and some basic quantitative features of the existing corpus are presented. The corpus was collected under two communicative conditions, one audio-only condition and one face-to-face condition. The recordings took place in a studio and a sound attenuated booth respectively, with head-set microphones (and in the face-to-face condition with two video cameras). The recordings have been segmented into Inter-Pausal-Units and transcribed using transcription conventions containing actual productions and canonical forms of what was said. It is made publicly available online.
SPPAS is a tool to produce automatic annotations which include utterance, word, syllabic and phonemic segmentations from a recorded speech sound and its transcription. SPPAS is distributed under the terms of the GNU Public License. It was successfully applied during the Evalita 2011 campaign, on Italian map-task dialogues. It can also deal with French, English and Chinese and there is an easy way to add other languages. The paper describes the development of resources and free tools, consisting of acoustic models, phonetic dictionaries, and libraries and programs to deal with these data. All of them are publicly available.
This paper addresses the problem of the enrichment of transcriptions in the perspective of an automatic phonetization. Phonetization is the process of representing sounds with phonetic signs. There are two general ways to construct a phonetization process: rule based systems (with rules based on inference approaches or proposed by expert linguists) and dictionary based solutions which consist in storing a maximum of phonological knowledge in a lexicon. In both cases, phonetization is based on a manual transcription. Such a transcription is established on the basis of conventions that can differ depending on their working out context. This present study focuses on three different enrichments of such a transcription. Evaluations compare phonetizations obtained from automatic systems to a reference phonetized manually. The test corpus is made of three types of speech: conversational speech, read speech and political debate. A specific algorithm for the rule-based system is proposed to deal with enrichments. The final system obtained a phonetization of about 95.2% correct (from 3.7% to 5.6% error rates depending on the corpus).
Cet article traite de l’analyse de débats politiques selon une orientation multimodale. Nous étudions plus particulièrement les réponses aux interruptions lors d’un débat à l’Assemblée nationale. Nous proposons de procéder à l’analyse via des annotations systématiques de différentes modalités. L’analyse argumentative nous a amenée à proposer une typologie de ces réponses. Celle-ci a été mise à l’épreuve d’une classification automatique. La difficulté dans la construction d’un tel système réside dans la nature même des données : multimodales, parfois manquantes et incertaines.
This paper presents the outline and performance of an automatic syllable boundary detection system. The syllabification of phonemes is performed with a rule-based system, implemented in a Java program. Phonemes are categorized into 6 classes. A set of specific rules are developed and categorized as general rules which can be applied in all cases, and exception rules which are applied in some specific situations. These rules deal with a French spontaneous speech corpus. Moreover, the proposed phonemes, classes and rules are listed in an external configuration file of the tool (under GPL licence) that make the tool very easy to adapt to a specific corpus by adding or modifying rules, phoneme encoding or phoneme classes, by the use of a new configuration file. Finally, performances are evaluated and compared to 3 other French syllabification systems and show significant improvements. Automatic system output and expert's syllabification are in agreement for most of syllable boundaries in our corpus.
Cet article présente nos premiers travaux en vue de la construction d’un système de traduction probabiliste pour le couple de langue vietnamien-français. La langue vietnamienne étant considérée comme une langue peu dotée, une des difficultés réside dans la constitution des corpus parallèles, indispensable à l’apprentissage des modèles. Nous nous concentrons sur la constitution d’un grand corpus parallèle vietnamien-français. La méthode d’identification automatique des paires de documents parallèles fondée sur la date de publication, les mots spéciaux et les scores d’alignements des phrases est appliquée. Cet article présente également la construction d’un premier système de traduction automatique probabiliste vietnamienfrançais et français-vietnamien à partir de ce corpus et discute l’opportunité d’utiliser des unités lexicales ou sous-lexicales pour le vietnamien (syllabes, mots, ou leurs combinaisons). Les performances du système sont encourageantes et se comparent avantageusement à celles du système de Google.
Dans cet article, nous traitons du problème de la modélisation statistique du langage pour les langues peu dotées et sans segmentation entre les mots. Tandis que le manque de données textuelles a un impact sur la performance des modèles, les erreurs introduites par la segmentation automatique peuvent rendre ces données encore moins exploitables. Pour exploiter au mieux les données textuelles, nous proposons une méthode qui effectue des segmentations multiples sur le corpus d’apprentissage au lieu d’une segmentation unique. Cette méthode basée sur les automates d’état finis permet de retrouver les n-grammes non trouvés par la segmentation unique et de générer des nouveaux n-grammes pour l’apprentissage de modèle du langage. L’application de cette approche pour l’apprentissage des modèles de langage pour les systèmes de reconnaissance automatique de la parole en langue khmère et vietnamienne s’est montrée plus performante que la méthode par segmentation unique, à base de règles.
In this paper we present an overview on the development of a large vocabulary continuous speech recognition (LVCSR) system for Khmer, the official language of Cambodia, spoken by more than 15 million people. As an under-resourced language, develop a LVCSR system for Khmer is a challenging task. We describe our methodologies for quick language data collection and processing for language modeling and acoustic modeling. For language modeling, we investigate the use of word and sub-word as basic modeling unit in order to see the potential of sub-word units in the case of unsegmented language like Khmer. Grapheme-based acoustic modeling is used to quickly build our Khmer language acoustic model. Furthermore, the approaches and tools used for the development of our system are documented and made publicly available on the web. We hope this will contribute to accelerate the development of LVCSR system for a new language, especially for under-resource languages of developing countries where resources and expertise are limited.
Le travail présenté dans cet article a été réalisé dans le cadre d’un projet global de traduction automatique de la parole. L’approche de traduction est fondée sur un langage pivot ou Interchange Format (IF), qui représente le sens de la phrase indépendamment de la langue. Nous proposons une méthode qui intègre des informations sémantiques dans le modèle statistique de langage du système de Reconnaissance Automatique de Parole. Le principe consiste a utiliser certaines classes définies dans l’IF comme des classes sémantiques dans le modèle de langage. Ceci permet au système de reconnaissance de la parole d’analyser partiellement en IF les tours de parole. Les expérimentations realisées montrent qu’avec cette approche, le système de reconnaissance peut analyser directement en IF une partie des données de dialogues de notre application, sans faire appel au système de traduction (35% des mots ; 58% des tours de parole), tout en maintenant le même niveau de performance du système global.
Les modèles statistiques du langage ont pour but de donner une représentation statistique de la langue mais souffrent de nombreuses imperfections. Des travaux récents ont montré que ces modèles peuvent être améliorés s’ils peuvent bénéficier de la connaissance du thème traité, afin de s’y adapter. Le thème du document est alors obtenu par un mécanisme d’identification thématique, mais les thèmes ainsi traités sont souvent de granularité différente, c’est pourquoi il nous semble opportun qu’ils soient organisés dans une hiérarchie. Cette structuration des thèmes implique la mise en place de techniques spécifiques d’identification thématique. Cet article propose un modèle statistique à base d’unigrammes pour identifier automatiquement le thème d’un document parmi une arborescence prédéfinie de thèmes possibles. Nous présentons également un critère qui permet au modèle de donner un degré de fiabilité à la décision prise. L’ensemble des expérimentations a été réalisé sur des données extraites du groupe ’fr’ des forums de discussion.