Miriam Voghera
2014
VOLIP: a corpus of spoken Italian and a virtuous example of reuse of linguistic resources
Iolanda Alfano
|
Francesco Cutugno
|
Aurelio De Rosa
|
Claudio Iacobini
|
Renata Savy
|
Miriam Voghera
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The corpus VoLIP (The Voice of LIP) is an Italian speech resource which associates the audio signals to the orthographic transcriptions of the LIP Corpus. The LIP Corpus was designed to represent diaphasic, diatopic and diamesic variation. The Corpus was collected in the early 90s to compile a frequency lexicon of spoken Italian and its size was tailored to produce a reliable frequency lexicon for the first 3,000 lemmas. Therefore, it consists of about 500,000 word tokens for 60 hours of recording. The speech materials belong to five different text registers and they were collected in four different cities. Thanks to a modern technological approach VoLIP web service allows users to search the LIP corpus using IMDI metadata, lexical or morpho-syntactic entry keys, receiving as result the audio portions aligned to the corresponding required entry. The VoLIP corpus is freely available at the URL http://www.parlaritaliano.it.
2006
An observatory on Spoken Italian linguistic resources and descriptive standards.
Miriam Voghera
|
Francesco Cutugno
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
We present the national project Parlare italiano: osservatorio degli usi linguistici, funded by the Italian Ministry of Education, Scientific Research and University (PRIN 2004). Ten research groups participate to the project from various Italian universities. The project has four fundamental objectives: 1) to plan a national website that collects the most recent theoretical and applied results on spoken language; 2) to create an observatory of the linguistic usages of the Italian spoken language; 3) to delineate and implement standard and formalized methods and procedures for the study of spoken language; 4) to develop a training program for young researchers. The website will be accessible starting from November 2006.