Multilevel corpus analysis: generating and querying an AGset of spoken Italian (SpIt-MDb).

Renata Savy; Francesco Cutugno; Claudia Crocco

Multilevel corpus analysis: generating and querying an AGset of spoken Italian (SpIt-MDb).

Renata Savy, Francesco Cutugno, Claudia Crocco

Abstract

In this paper we present an application of AGTK to a corpus of spoken Italian annotated at many different linguistic levels. The work consists of two parts: a) the presentation of AG-SpIt, a toolkit devoted to corpus data management that we developed according to AGTK proposals; b) the presentation of corpus structure together with some examples and results of cross-level linguistic analyses obtained querying the database (SpIt-MDb). As this work is still an ongoing investigation, results must be considered preliminary, as a demo illustrating the potentiality of the tool and the advantages it introduces to validate linguistic theories and annotation systems. Currently, SpIt-MDb is a linguistic resource under development; it represents one of the first attempts to create an Italian corpus labelled at various linguistic levels (from acoustic/sub-phonetic, to textual/pragmatic ones) which can be queried in the interrelations among levels.

Anthology ID:: L06-1295
Volume:: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:: May
Year:: 2006
Address:: Genoa, Italy
Editors:: Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2006/pdf/492_pdf.pdf
DOI:
Bibkey:
Cite (ACL):: Renata Savy, Francesco Cutugno, and Claudia Crocco. 2006. Multilevel corpus analysis: generating and querying an AGset of spoken Italian (SpIt-MDb).. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):: Multilevel corpus analysis: generating and querying an AGset of spoken Italian (SpIt-MDb). (Savy et al., LREC 2006)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2006/pdf/492_pdf.pdf

PDF Search