Lucelene Lopes

This paper presents PortiLexicon-UD, a large and freely available lexicon for Portuguese delivering morphosyntactic information according to the Universal Dependencies model. This lexical resource includes part of speech tags, lemmas, and morphological information for words, with 1,221,218 entries (considering word duplication due to different combination of PoS tag, lemma, and morphological features). We report the lexicon creation process, its computational data structure, and its evaluation over an annotated corpus, showing that it has a high language coverage and good quality data.

2015

pdf
Building and Applying Profiles Through Term Extraction
Lucelene Lopes | Renata Vieira
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology

2014

This paper proposes a method to build bilingual dictionaries for specific domains defined by a parallel corpora. The proposed method is based on an original method that is not domain specific. Both the original and the proposed methods are constructed with previously available natural language processing tools. Therefore, this paper contribution resides in the choice and parametrization of the chosen tools. To illustrate the proposed method benefits we conduct an experiment over technical manuals in English and Portuguese. The results of our proposed method were analyzed by human specialists and our results indicates significant increases in precision for unigrams and muli-grams. Numerically, the precision increase is as big as 15% according to our evaluation.

pdf abs
Comparative Analysis of Portuguese Named Entities Recognition Tools
Daniela Amaral | Evandro Fonseca | Lucelene Lopes | Renata Vieira
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes an experiment to compare four tools to recognize named entities in Portuguese texts. The experiment was made over the HAREM corpora, a golden standard for named entities recognition in Portuguese. The tools experimented are based on natural language processing techniques and also machine learning. Specifically, one of the tools is based on Conditional random fields, an unsupervised machine learning model that has being used to named entities recognition in several languages, while the other tools follow more traditional natural language approaches. The comparison results indicate advantages for different tools according to the different classes of named entities. Despite of such balance among tools, we conclude pointing out foreseeable advantages to the machine learning based tool.

2013

pdf
Aplicando Pontos de Corte para Listas de Termos Extraídos (Applying Cut-off Points to Lists of Extracted Terms) [in Portuguese]
Lucelene Lopes | Renata Vieira
Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology

2012

pdf abs
A Fast, Memory Efficient, Scalable and Multilingual Dictionary Retriever
Paulo Fernandes | Lucelene Lopes | Carlos A. Prolo | Afonso Sales | Renata Vieira
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents a novel approach to deal with dictionary retrieval. This new approach is based on a very efficient and scalable theoretical structure called Multi-Terminal Multi-valued Decision Diagrams (MTMDD). Such tool allows the definition of very large, even multilingual, dictionaries without significant increase in memory demands, and also with virtually no additional processing cost. Besides the general idea of the novel approach, this paper presents a description of the technologies involved, and their implementation in a software package called WAGGER. Finally, we also present some examples of usage and possible applications of this dictionary retriever.