Vera Lima


2014

pdf
Boosting Open Information Extraction with Noun-Based Relations
Clarissa Xavier | Vera Lima
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Open Information Extraction (Open IE) is a strategy for learning relations from texts, regardless the domain and without predefining these relations. Work in this area has focused mainly on verbal relations. In order to extend Open IE to extract relationships that are not expressed by verbs, we present a novel Open IE approach that extracts relations expressed in noun compounds (NCs), such as (oil, extracted from, olive) from “olive oil”, or in adjective-noun pairs (ANs), such as (moon, that is, gorgeous) from “gorgeous moon”. The approach consists of three steps: detection of NCs and ANs, interpretation of these compounds in view of corpus enrichment and extraction of relations from the enriched corpus. To confirm the feasibility of this method we created a prototype and evaluated the impact of the application of our proposal in two state-of-the-art Open IE extractors. Based on these tests we conclude that the proposed approach is an important step to fulfil the gap concerning the extraction of relations within the noun compounds and adjective-noun pairs in Open IE.

2012

pdf
Combining Formal Concept Analysis and semantic information for building ontological structures from texts : an exploratory study
Sílvia Moraes | Vera Lima
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This work studies conceptual structures based on the Formal Concept Analysis method. We build these structures based on lexico-semantic information extracted from texts, among which we highlight the semantic roles. In our research, we propose ways to include semantic roles in concepts produced by this formal method. We analyze the contribution of semantic roles and verb classes in the composition of these concepts through structural measures. In these studies, we use the Penn Treebank Sample and SemLink 1.1 corpora, both in English.

2008

pdf
Keywords, k-NN and Neural Networks: a Support for Hierarchical Categorization of Texts in Brazilian Portuguese
Susana Azeredo | Silvia Moraes | Vera Lima
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

A frequent problem in automatic categorization applications involving Portuguese language is the absence of large corpora of previously classified documents, which permit the validation of experiments carried out. Generally, the available corpora are not classified or, when they are, they contain a very reduced number of documents. The general goal of this study is to contribute to the development of applications which aim at text categorization for Brazilian Portuguese. Specifically, we point out that keywords selection associated with neural networks can improve results in the categorization of Brazilian Portuguese texts. The corpus is composed of 30 thousand texts from the Folha de São Paulo newspaper, organized in 29 sections. In the process of categorization, the k-Nearest Neighbor (k-NN) algorithm and the Multilayer Perceptron neural networks trained with the backpropagation algorithm are used. It is also part of our study to test the identification of keywords parting from the log-likelihood statistical measure and to use them as features in the categorization process. The results clearly show that the precision is better when using neural networks than when using the k-NN.