Rafael Dias


2020

pdf
Cross-domain Author Gender Classification in Brazilian Portuguese
Rafael Dias | Ivandré Paraboni
Proceedings of the Twelfth Language Resources and Evaluation Conference

Author profiling models predict demographic characteristics of a target author based on the text that they have written. Systems of this kind will often follow a single-domain approach, in which the model is trained from a corpus of labelled texts in a given domain, and it is subsequently validated against a test corpus built from precisely the same domain. Although single-domain settings are arguably ideal, this strategy gives rise to the question of how to proceed when no suitable training corpus (i.e., a corpus that matches the test domain) is available. To shed light on this issue, this paper discusses a cross-domain gender classification task based on four domains (Facebook, crowd sourced opinions, Blogs and E-gov requests) in the Brazilian Portuguese language. A number of simple gender classification models using word- and psycholinguistics-based features alike are introduced, and their results are compared in two kinds of cross-domain setting: first, by making use of a single text source as training data for each task, and subsequently by combining multiple sources. Results confirm previous findings related to the effects of corpus size and domain similarity in English, and pave the way for further studies in the field.

2018

pdf
Building a Corpus for Personality-dependent Natural Language Understanding and Generation
Ricelli Ramos | Georges Neto | Barbara Silva | Danielle Monteiro | Ivandré Paraboni | Rafael Dias
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Author Profiling from Facebook Corpora
Fernando Hsieh | Rafael Dias | Ivandré Paraboni
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)