Larissa Freitas
2020
An Assessment of Language Identification Methods on Tweets and Wikipedia Articles
Pedro Vernetti
|
Larissa Freitas
Proceedings of the Fourth Widening Natural Language Processing Workshop
Language identification is the task of determining the language which a given text is written. This task is important for Natural Language Processing and Information Retrieval activities. Two popular approaches for language identification are the N-grams and stopwords models. In this paper, these two models were tested on different types of documents such as short, irregular texts (tweets) and long, regular texts (Wikipedia articles).
A Comparison of Identification Methods of Brazilian Music Styles by Lyrics
Patrick Guimarães
|
Jader Froes
|
Douglas Costa
|
Larissa Freitas
Proceedings of the Fourth Widening Natural Language Processing Workshop
In our work, we applied different techniques for the task of genre classification using lyrics. Utilizing our dataset with lyrics of typical genres in Brazil divided into seven classes, we apply some models used in machine learning and deep learning classification tasks. We explore the performance of usual models for text classification using an input in the Portuguese language. We also compare the use of RNN and classic machine learning approaches for text classification, exploring the most used methods in the field.
Search