Sara Silveira


2012

pdf
A PropBank for Portuguese: the CINTIL-PropBank
António Branco | Catarina Carvalheiro | Sílvia Pereira | Sara Silveira | João Silva | Sérgio Castro | João Graça
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

With the CINTIL-International Corpus of Portuguese, an ongoing corpus annotated with fully flegded grammatical representation, sentences get not only a high level of lexical, morphological and syntactic annotation but also a semantic analysis that prepares the data to a manual specification step and thus opens the way for a number of tools and resources for which there is a great research focus at the present. This paper reports on the construction of a propbank that builds on CINTIL-DeepGramBank, with nearly 10 thousand sentences, on the basis of a deep linguistic grammar and on the process and the linguistic criteria guiding that construction, which makes possible to obtain a complete PropBank with both syntactic and semantic levels of linguistic annotation. Taking into account this and the promising scores presented in this study for inter-annotator agreement, CINTIL-PropBank presents itself as a great resource to train a semantic role labeller, one of our goals with this project.

2010

pdf
Developing a Deep Linguistic Databank Supporting a Collection of Treebanks: the CINTIL DeepGramBank
António Branco | Francisco Costa | João Silva | Sara Silveira | Sérgio Castro | Mariana Avelãs | Clara Pinto | João Graça
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Corpora of sentences annotated with grammatical information have been deployed by extending the basic lexical and morphological data with increasingly complex information, such as phrase constituency, syntactic functions, semantic roles, etc. As these corpora grow in size and the linguistic information to be encoded reaches higher levels of sophistication, the utilization of annotation tools and, above all, supporting computational grammars appear no longer as a matter of convenience but of necessity. In this paper, we report on the design features, the development conditions and the methodological options of a deep linguistic databank, the CINTIL DeepGramBank. In this corpus, sentences are annotated with fully fledged linguistically informed grammatical representations that are produced by a deep linguistic processing grammar, thus consistently integrating morphological, syntactic and semantic information. We also report on how such corpus permits to straightforwardly obtain a whole range of past generation annotated corpora (POS, NER and morphology), current generation treebanks (constituency treebanks, dependency banks, propbanks) and next generation databanks (logical form banks) simply by means of a very residual selection/extraction effort to get the appropriate ""views"" exposing the relevant layers of information.

2009

pdf bib
LX-Center: a center of online linguistic services
António Branco | Francisco Costa | Eduardo Ferreira | Pedro Martins | Filipe Nunes | João Silva | Sara Silveira
Proceedings of the ACL-IJCNLP 2009 Software Demonstrations

2008

pdf
LX-Service: Web Services of Language Technology for Portuguese
António Branco | Francisco Costa | Pedro Martins | Filipe Nunes | João Silva | Sara Silveira
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In the present paper we report on the development of a cluster of web services of language technology for Portuguese that we named as LXService. These web services permit the direct interaction of client applications with language processing tools via the Internet. This way of making available language technology was motivated by the need of its integration in an eLearning environment. In particular, it was motivated by the development of new multilingual functionalities that were aimed at extending a Learning Management System and that needed to resort to the outcome of some of those tools in a distributed and remote fashion. This specific usage situation happens however to be representative of a typical and recurrent set up in the utilization of language processing tools in different settings and projects. Therefore, the approach reported here offers not only a solution for this specific problem, which immediately motivated it, but contributes also some first steps for what we see as an important paradigm shift in terms of the way language technology can be distributed and find a better way to unleash its full potential and impact.