Viviane Moreira


2022

pdf
INF-UFRGS at SemEval-2022 Task 5: analyzing the performance of multimodal models
Gustavo Lorentz | Viviane Moreira
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes INF-UFRGS submission for SemEval-2022 Task 5 Multimodal Automatic Misogyny Identification (MAMI). Unprecedented levels of harassment came with the ever-growing internet usage as a mean of worldwide communication. The goal of the task is to improve the quality of existing methods for misogyny identification, many of which require dedicated personnel, hence the need for automation. We experimented with five existing models, including ViLBERT and Visual BERT - both uni and multimodally pretrained - and MMBT. The datasets consist of memes with captions in English. The results show that all models achieved Macro-F1 scores above 0.64. ViLBERT was the best performer with a score of 0.698.

pdf
UFRGSent at SemEval-2022 Task 10: Structured Sentiment Analysis using a Question Answering Model
Lucas Pessutto | Viviane Moreira
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes the system submitted by our team (UFRGSent) to SemEval-2022 Task 10: Structured Sentiment Analysis. We propose a multilingual approach that relies on a Question Answering model to find tuples consisting of aspect, opinion, and holder. The approach starts from general questions and uses the extracted tuple elements to find the remaining components. Finally, we employ an aspect sentiment classification model to classify the polarity of the entire tuple. Despite our method being in a mid-rank position on SemEval competition, we show that the question-answering approach can achieve good coverage retrieving sentiment tuples, allowing room for improvements in the technique.

2020

pdf
Offensive Video Detection: Dataset and Baseline Results
Cleber Alcântara | Viviane Moreira | Diego Feijo
Proceedings of the Twelfth Language Resources and Evaluation Conference

Web-users produce and publish high volumes of data of various types, such as text, images, and videos. The platforms try to restrain their users from publishing offensive content to keep a friendly and respectful environment and rely on moderators to filter the posts. However, this method is insufficient due to the high volume of publications. The identification of offensive material can be performed automatically using machine learning, which needs annotated datasets. Among the published datasets in this matter, the Portuguese language is underrepresented, and videos are little explored. We investigated the problem of offensive video detection by assembling and publishing a dataset of videos in Portuguese containing mostly textual features. We ran experiments using popular machine learning classifiers used in this domain and reported our findings, alongside multiple evaluation metrics. We found that using word embedding with Deep Learning classifiers achieved the best results on average. CNN architectures, Naive Bayes, and Random Forest ranked top among different experiments. Transfer Learning models outperformed Classic algorithms when processing video transcriptions, but scored lower using other feature sets. These findings can be used as a baseline for future works on this subject.

pdf
Embeddings for Named Entity Recognition in Geoscience Portuguese Literature
Bernardo Consoli | Joaquim Santos | Diogo Gomes | Fabio Cordeiro | Renata Vieira | Viviane Moreira
Proceedings of the Twelfth Language Resources and Evaluation Conference

This work focuses on Portuguese Named Entity Recognition (NER) in the Geology domain. The only domain-specific dataset in the Portuguese language annotated for NER is the GeoCorpus. Our approach relies on BiLSTM-CRF neural networks (a widely used type of network for this area of research) that use vector and tensor embedding representations. Three types of embedding models were used (Word Embeddings, Flair Embeddings, and Stacked Embeddings) under two versions (domain-specific and generalized). The domain specific Flair Embeddings model was originally trained with a generalized context in mind, but was then fine-tuned with domain-specific Oil and Gas corpora, as there simply was not enough domain corpora to properly train such a model. Each of these embeddings was evaluated separately, as well as stacked with another embedding. Finally, we achieved state-of-the-art results for this domain with one of our embeddings, and we performed an error analysis on the language model that achieved the best results. Furthermore, we investigated the effects of domain-specific versus generalized embeddings.

2019

pdf
Summarizing Legal Rulings: Comparative Experiments
Diego Feijo | Viviane Moreira
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In the context of text summarization, texts in the legal domain have peculiarities related to their length and to their specialized vocabulary. Recent neural network-based approaches can achieve high-quality scores for text summarization. However, these approaches have been used mostly for generating very short abstracts for news articles. Thus, their applicability to the legal domain remains an open issue. In this work, we experimented with ten extractive and four abstractive models in a real dataset of legal rulings. These models were compared with an extractive baseline based on heuristics to select the most relevant parts of the text. Our results show that abstractive approaches significantly outperform extractive methods in terms of ROUGE scores.

2018

pdf
A Large Parallel Corpus of Full-Text Scientific Articles
Felipe Soares | Viviane Moreira | Karin Becker
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2015

pdf
UFRGS: Identifying Categories and Targets in Customer Reviews
Anderson Kauer | Viviane Moreira
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf
Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them
Bruno Laranjeira | Viviane Moreira | Aline Villavicencio | Carlos Ramisch | Maria José Finatto
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Comparable corpora have been used as an alternative for parallel corpora as resources for computational tasks that involve domain-specific natural language processing. One way to gather documents related to a specific topic of interest is to traverse a portion of the web graph in a targeted way, using focused crawling algorithms. In this paper, we compare several focused crawling algorithms using them to collect comparable corpora on a specific domain. Then, we compare the evaluation of the focused crawling algorithms to the performance of linguistic processes executed after training with the corresponding generated corpora. Also, we propose a novel approach for focused crawling, exploiting the expressive power of multiword expressions.

2011

pdf
Identification and Treatment of Multiword Expressions Applied to Information Retrieval
Otavio Acosta | Aline Villavicencio | Viviane Moreira
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World