Jade Copet

2021

Abstract We introduce Generative Spoken Language Modeling, the task of learning the acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and a set of metrics to automatically evaluate the learned representations at acoustic and linguistic levels for both encoding and generation. We set up baseline systems consisting of a discrete speech encoder (returning pseudo-text units), a generative language model (trained on pseudo- text), and a speech decoder (generating a waveform from pseudo-text) all trained without supervision and validate the proposed metrics with human evaluation. Across 3 speech encoders (CPC, wav2vec 2.0, HuBERT), we find that the number of discrete units (50, 100, or 200) matters in a task-dependent and encoder- dependent way, and that some combinations approach text-based systems.1

2016

pdf bib abs
Radarly : écouter et analyser le web conversationnel en temps réel (Real time listening and analysis of the social web using Radarly)
Jade Copet | Christine de Carvalho | Virginie Mouilleron | Benoit Tabutiaux | Hugo Zanghi
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 5 : Démonstrations

De par le contexte conversationnel digital, l’outil Radarly a été conçu pour permettre de traiter de grands volumes de données hétérogènes en temps réel, de générer de nouveaux indicateurs et de les visualiser sur une interface cohérente et confortable afin d’en tirer des analyses et études pertinentes. Ce document expose les techniques et processus utilisés pour extraire et traiter toutes ces données.

Co-authors

Abdelrahman Mohamed 1

Emmanuel Dupoux 1

Jade Copet

2021

2016

Co-authors

Venues