Stephan Vanni

2014

pdf abs
LIUM English-to-French spoken language translation system and the Vecsys/LIUM automatic speech recognition system for Italian language for IWSLT 2014
Anthony Rousseau | Loïc Barrault | Paul Deléglise | Yannick Estève | Holger Schwenk | Samir Bennacef | Armando Muscariello | Stephan Vanni
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the Spoken Language Translation system developed by the LIUM for the IWSLT 2014 evaluation campaign. We participated in two of the proposed tasks: (i) the Automatic Speech Recognition task (ASR) in two languages, Italian with the Vecsys company, and English alone, (ii) the English to French Spoken Language Translation task (SLT). We present the approaches and specificities found in our systems, as well as the results from the evaluation campaign.

2008

pdf abs
CallSurf: Automatic Transcription, Indexing and Structuration of Call Center Conversational Speech for Knowledge Extraction and Query by Content
Martine Garnier-Rizet | Gilles Adda | Frederik Cailliau | Sylvie Guillemin-Lanne | Claire Waast-Richard | Lori Lamel | Stephan Vanni | Claire Waast-Richard
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Being the clients first interface, call centres worldwide contain a huge amount of information of all kind under the form of conversational speech. If accessible, this information can be used to detect eg. major events and organizational flaws, improve customer relations and marketing strategies. An efficient way to exploit the unstructured data of telephone calls is data-mining, but current techniques apply on text only. The CallSurf project gathers a number of academic and industrial partners covering the complete platform, from automatic transcription to information retrieval and data mining. This paper concentrates on the speech recognition module as it discusses the collection, the manual transcription of the training corpus and the techniques used to build the language model. The NLP techniques used to pre-process the transcribed corpus for data mining are POS tagging, lemmatization, noun group and named entity recognition. Some of them have been especially adapted to the conversational speech characteristics. POS tagging and preliminary data mining results obtained on the manually transcribed corpus are briefly discussed.

Co-authors

Armando Muscariello 1

Venues

lrec1
iwslt1