2023
pdf
bib
Proceedings of the 2nd Edition of the Universal Dependencies Brazilian Festival
Thiago Alexandre Salgueiro Pardo
|
Magali Sanches Duran
|
Lucelene Lopes
Proceedings of the 2nd Edition of the Universal Dependencies Brazilian Festival
pdf
Verifica-UD: a Verifier for Universal Dependencies Annotation for Portuguese
Lucelene Lopes
|
Magali Sanches Duran
|
Thiago Alexandre Salgueiro Pardo
Proceedings of the 2nd Edition of the Universal Dependencies Brazilian Festival
pdf
Enhanced dependencies para o português brasileiro
Adriana S. Pagano
|
Magali Sanches Duran
|
Thiago Alexandre Salgueiro Pardo
Proceedings of the 2nd Edition of the Universal Dependencies Brazilian Festival
pdf
Insights into the UD Tagset: Unveiling its Intricacies
Magali Sanches Duran
Proceedings of the 2nd Edition of the Universal Dependencies Brazilian Festival
2022
pdf
abs
PortiLexicon-UD: a Portuguese Lexical Resource according to Universal Dependencies Model
Lucelene Lopes
|
Magali Duran
|
Paulo Fernandes
|
Thiago Pardo
Proceedings of the Thirteenth Language Resources and Evaluation Conference
This paper presents PortiLexicon-UD, a large and freely available lexicon for Portuguese delivering morphosyntactic information according to the Universal Dependencies model. This lexical resource includes part of speech tags, lemmas, and morphological information for words, with 1,221,218 entries (considering word duplication due to different combination of PoS tag, lemma, and morphological features). We report the lexicon creation process, its computational data structure, and its evaluation over an annotated corpus, showing that it has a high language coverage and good quality data.
2021
pdf
bib
On auxiliary verb in Universal Dependencies: untangling the issue and proposing a systematized annotation strategy
Magali Duran
|
Adriana Pagano
|
Amanda Rassi
|
Thiago Pardo
Proceedings of the Sixth International Conference on Dependency Linguistics (Depling, SyntaxFest 2021)
2018
pdf
abs
A Nontrivial Sentence Corpus for the Task of Sentence Readability Assessment in Portuguese
Sidney Evaldo Leal
|
Magali Sanches Duran
|
Sandra Maria Aluísio
Proceedings of the 27th International Conference on Computational Linguistics
Effective textual communication depends on readers being proficient enough to comprehend texts, and texts being clear enough to be understood by the intended audience, in a reading task. When the meaning of textual information and instructions is not well conveyed, many losses and damages may occur. Among the solutions to alleviate this problem is the automatic evaluation of sentence readability, task which has been receiving a lot of attention due to its large applicability. However, a shortage of resources, such as corpora for training and evaluation, hinders the full development of this task. In this paper, we generate a nontrivial sentence corpus in Portuguese. We evaluate three scenarios for building it, taking advantage of a parallel corpus of simplification, in which each sentence triplet is aligned and has simplification operations annotated, being ideal for justifying possible mistakes of future methods. The best scenario of our corpus PorSimplesSent is composed of 4,888 pairs, which is bigger than a similar corpus for English; all the three versions of it are publicly available. We created four baselines for PorSimplesSent and made available a pairwise ranking method, using 17 linguistic and psycholinguistic features, which correctly identifies the ranking of sentence pairs with an accuracy of 74.2%.
2015
pdf
Automatic Generation of a Lexical Resource to support Semantic Role Labeling in Portuguese
Magali Sanches Duran
|
Sandra Aluísio
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics
pdf
A Normalizer for UGC in Brazilian Portuguese
Magali Sanches Duran
|
Maria das Graças Volpe Nunes
|
Lucas Avanço
Proceedings of the Workshop on Noisy User-generated Text
2014
pdf
abs
Generating a Lexicon of Errors in Portuguese to Support an Error Identification System for Spanish Native Learners
Lianet Sepúlveda Torres
|
Magali Sanches Duran
|
Sandra Aluísio
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Portuguese is a less resourced language in what concerns foreign language learning. Aiming to inform a module of a system designed to support scientific written production of Spanish native speakers learning Portuguese, we developed an approach to automatically generate a lexicon of wrong words, reproducing language transfer errors made by such foreign learners. Each item of the artificially generated lexicon contains, besides the wrong word, the respective Spanish and Portuguese correct words. The wrong word is used to identify the interlanguage error and the correct Spanish and Portuguese forms are used to generate the suggestions. Keeping control of the correct word forms, we can provide correction or, at least, useful suggestions for the learners. We propose to combine two automatic procedures to obtain the error correction: i) a similarity measure and ii) a translation algorithm based on aligned parallel corpus. The similarity-based method achieved a precision of 52%, whereas the alignment-based method achieved a precision of 90%. In this paper we focus only on interlanguage errors involving suffixes that have different forms in both languages. The approach, however, is very promising to tackle other types of errors, such as gender errors.
pdf
abs
A Large Corpus of Product Reviews in Portuguese: Tackling Out-Of-Vocabulary Words
Nathan Hartmann
|
Lucas Avanço
|
Pedro Balage
|
Magali Duran
|
Maria das Graças Volpe Nunes
|
Thiago Pardo
|
Sandra Aluísio
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Web 2.0 has allowed a never imagined communication boom. With the widespread use of computational and mobile devices, anyone, in practically any language, may post comments in the web. As such, formal language is not necessarily used. In fact, in these communicative situations, language is marked by the absence of more complex syntactic structures and the presence of internet slang, with missing diacritics, repetitions of vowels, and the use of chat-speak style abbreviations, emoticons and colloquial expressions. Such language use poses severe new challenges for Natural Language Processing (NLP) tools and applications, which, so far, have focused on well-written texts. In this work, we report the construction of a large web corpus of product reviews in Brazilian Portuguese and the analysis of its lexical phenomena, which support the development of a lexical normalization tool for, in future work, subsidizing the use of standard NLP products for web opinion mining and summarization purposes.
pdf
Some Issues on the Normalization of a Corpus of Products Reviews in Portuguese
Magali Sanches Duran
|
Lucas Avanço
|
Sandra Aluísio
|
Thiago Pardo
|
Maria da Graça Volpe Nunes
Proceedings of the 9th Web as Corpus Workshop (WaC-9)
2013
pdf
Identifying Pronominal Verbs: Towards Automatic Disambiguation of the Clitic ‘se’ in Portuguese
Magali Sanches Duran
|
Carolina Evaristo Scarton
|
Sandra Maria Aluísio
|
Carlos Ramisch
Proceedings of the 9th Workshop on Multiword Expressions
pdf
Um repositório de verbos para a anotação de papéis semânticos disponível na web (A Verb Repository for Semantic Role Labeling Available in the Web) [in Portuguese]
Magali Sanches Duran
|
Jhonata Pereira Martins
|
Sandra Maria Aluísio
Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology
2012
pdf
abs
Propbank-Br: a Brazilian Treebank annotated with semantic role labels
Magali Sanches Duran
|
Sandra Maria Aluísio
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper reports the annotation of a Brazilian Portuguese Treebank with semantic role labels following Propbank guidelines. A different language and a different parser output impact the task and require some decisions on how to annotate the corpus. Therefore, a new annotation guide ― called Propbank-Br - has been generated to deal with specific language phenomena and parser problems. In this phase of the project, the corpus was annotated by a unique linguist. The annotation task reported here is inserted in a larger projet for the Brazilian Portuguese language. This project aims to build Brazilian verbs frames files and a broader and distributed annotation of semantic role labels in Brazilian Portuguese, allowing inter-annotator agreement measures. The corpus, available in web, is already being used to build a semantic tagger for Portuguese language.
2011
pdf
Identifying and Analyzing Brazilian Portuguese Complex Predicates
Magali Sanches Duran
|
Carlos Ramisch
|
Sandra Maria Aluísio
|
Aline Villavicencio
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
pdf
Propbank-Br: a Brazilian Portuguese corpus annotated with semantic role labels
Magali Sanches Duran
|
Sandra Maria Aluísio
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology
2010
pdf
abs
Assigning Wh-Questions to Verbal Arguments: Annotation Tools Evaluation and Corpus Building
Magali Sanches Duran
|
Marcelo Adriano Amâncio
|
Sandra Maria Aluísio
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This work reports the evaluation and selection of annotation tools to assign wh-question labels to verbal arguments in a sentence. Wh-question assignment discussed herein is a kind of semantic annotation which involves two tasks: making delimitation of verbs and arguments, and linking verbs to its arguments by question labels. As it is a new type of semantic annotation, there is no report about requirements an annotation tool should have to face it. For this reason, we decided to select the most appropriated tool in two phases. In the first phase, we executed the task with an annotation tool we have used before in another task. Such phase helped us to test the task and enabled us to know which features were or not desirable in an annotation tool for our purpose. In the second phase, guided by such requirements, we evaluated several tools and selected a tool for the real task. After corpus annotation conclusion, we report some of the annotation results and some comments on the improvements there should be made in an annotation tool to better support such kind of annotation task.