The Sensem Corpus: a Corpus Annotated at the Syntactic and Semantic Level
Irene Castellón, Ana Fernández-Montraveta, Gloria Vázquez, Laura Alonso Alemany, Joan Antoni Capilla
Abstract
The primary aim of the project SENSEM (Sentence Semantics, BFF2003-06456) is the construction of a Lexical Data Base illustrating the syntactic and semantic behavior of each of the senses of the 250 most frequent verbs of Spanish. With this objective in mind, we are currently building an annotated corpus consisting of sentences extracted from the electronic version of the newspaper El Periódico de Catalunya, totalling approximately 1 million words, with 100 examples of each verb. By the time of the conference, we will be about to complete the annotation of 25,000 sentences, which means roughly a corpus of 800,000 words. Approximately 400,000 of them will have been revised. We expect to make the corpus publicly available by the end of 2006.- Anthology ID:
- L06-1245
- Volume:
- Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
- Month:
- May
- Year:
- 2006
- Address:
- Genoa, Italy
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/414_pdf.pdf
- DOI:
- Cite (ACL):
- Irene Castellón, Ana Fernández-Montraveta, Gloria Vázquez, Laura Alonso Alemany, and Joan Antoni Capilla. 2006. The Sensem Corpus: a Corpus Annotated at the Syntactic and Semantic Level. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
- Cite (Informal):
- The Sensem Corpus: a Corpus Annotated at the Syntactic and Semantic Level (Castellón et al., LREC 2006)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/414_pdf.pdf