Karin Becker


2018

pdf
A Large Parallel Corpus of Full-Text Scientific Articles
Felipe Soares | Viviane Moreira | Karin Becker
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
UFRGS Participation on the WMT Biomedical Translation Shared Task
Felipe Soares | Karin Becker
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the machine translation systems developed by the Universidade Federal do Rio Grande do Sul (UFRGS) team for the biomedical translation shared task. Our systems are based on statistical machine translation and neural machine translation, using the Moses and OpenNMT toolkits, respectively. We participated in four translation directions for the English/Spanish and English/Portuguese language pairs. To create our training data, we concatenated several parallel corpora, both from in-domain and out-of-domain sources, as well as terminological resources from UMLS. Our systems achieved the best BLEU scores according to the official shared task evaluation.

2017

pdf
INF-UFRGS at SemEval-2017 Task 5: A Supervised Identification of Sentiment Score in Tweets and Headlines
Tiago Zini | Karin Becker | Marcelo Dias
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes a supervised solution for detecting the polarity scores of tweets or headline news in the financial domain, submitted to the SemEval 2017 Fine-Grained Sentiment Analysis on Financial Microblogs and News Task. The premise is that it is possible to understand market reaction over a company stock by measuring the positive/negative sentiment contained in the financial tweets and news headlines, where polarity is measured in a continuous scale ranging from -1.0 (very bearish) to 1.0 (very bullish). Our system receives as input the textual content of tweets or news headlines, together with their ids, stock cashtag or name of target company, and the polarity score gold standard for the training dataset. Our solution retrieves features from these text instances using n-gram, hashtags, sentiment score calculated by a external APIs and others features to train a regression model capable to detect continuous score of these sentiments with precision.

2016

pdf
INF-UFRGS-OPINION-MINING at SemEval-2016 Task 6: Automatic Generation of a Training Corpus for Unsupervised Identification of Stance in Tweets
Marcelo Dias | Karin Becker
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)