2020
pdf
abs
UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a Multi-Task Learning Architecture for Memotion Analysis
George-Alexandru Vlad
|
George-Eduard Zaharia
|
Dumitru-Clementin Cercel
|
Costin Chiru
|
Stefan Trausan-Matu
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Users from the online environment can create different ways of expressing their thoughts, opinions, or conception of amusement. Internet memes were created specifically for these situations. Their main purpose is to transmit ideas by using combinations of images and texts such that they will create a certain state for the receptor, depending on the message the meme has to send. These posts can be related to various situations or events, thus adding a funny side to any circumstance our world is situated in. In this paper, we describe the system developed by our team for SemEval-2020 Task 8: Memotion Analysis. More specifically, we introduce a novel system to analyze these posts, a multimodal multi-task learning architecture that combines ALBERT for text encoding with VGG-16 for image representation. In this manner, we show that the information behind them can be properly revealed. Our approach achieves good performance on each of the three subtasks of the current competition, ranking 11th for Subtask A (0.3453 macro F1-score), 1st for Subtask B (0.5183 macro F1-score), and 3rd for Subtask C (0.3171 macro F1-score) while exceeding the official baseline results by high margins.
2019
pdf
abs
Building a Comprehensive Romanian Knowledge Base for Drug Administration
Bogdan Nicula
|
Mihai Dascalu
|
Maria-Dorinela Sîrbu
|
Ștefan Trăușan-Matu
|
Alexandru Nuță
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Information on drug administration is obtained traditionally from doctors and pharmacists, as well as leaflets which provide in most cases cumbersome and hard-to-follow details. Thus, the need for medical knowledge bases emerges to provide access to concrete and well-structured information which can play an important role in informing patients. This paper introduces a Romanian medical knowledge base focused on drug-drug interactions, on representing relevant drug information, and on symptom-disease relations. The knowledge base was created by extracting and transforming information using Natural Language Processing techniques from both structured and unstructured sources, together with manual annotations. The resulting Romanian ontologies are aligned with larger English medical ontologies. Our knowledge base supports queries regarding drugs (e.g., active ingredients, concentration, expiration date), drug-drug interaction, symptom-disease relations, as well as drug-symptom relations.
pdf
abs
SC-UPB at the VarDial 2019 Evaluation Campaign: Moldavian vs. Romanian Cross-Dialect Topic Identification
Cristian Onose
|
Dumitru-Clementin Cercel
|
Stefan Trausan-Matu
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
This paper describes our models for the Moldavian vs. Romanian Cross-Topic Identification (MRC) evaluation campaign, part of the VarDial 2019 workshop. We focus on the three subtasks for MRC: binary classification between the Moldavian (MD) and the Romanian (RO) dialects and two cross-dialect multi-class classification between six news topics, MD to RO and RO to MD. We propose several deep learning models based on long short-term memory cells, Bidirectional Gated Recurrent Unit (BiGRU) and Hierarchical Attention Networks (HAN). We also employ three word embedding models to represent the text as a low dimensional vector. Our official submission includes two runs of the BiGRU and HAN models for each of the three subtasks. The best submitted model obtained the following macro-averaged F1 scores: 0.708 for subtask 1, 0.481 for subtask 2 and 0.480 for the last one. Due to a read error caused by the quoting behaviour over the test file, our final submissions contained a smaller number of items than expected. More than 50% of the submission files were corrupted. Thus, we also present the results obtained with the corrected labels for which the HAN model achieves the following results: 0.930 for subtask 1, 0.590 for subtask 2 and 0.687 for the third one.
2017
pdf
bib
abs
oIQa: An Opinion Influence Oriented Question Answering Framework with Applications to Marketing Domain
Dumitru-Clementin Cercel
|
Cristian Onose
|
Stefan Trausan-Matu
|
Florin Pop
Proceedings of the 1st Workshop on Natural Language Processing and Information Retrieval associated with RANLP 2017
Understanding questions and answers in QA system is a major challenge in the domain of natural language processing. In this paper, we present a question answering system that influences the human opinions in a conversation. The opinion words are quantified by using a lexicon-based method. We apply Latent Semantic Analysis and the cosine similarity measure between candidate answers and each question to infer the answer of the chatbot.
2016
pdf
Using Embedding Masks for Word Categorization
Stefan Ruseti
|
Traian Rebedea
|
Stefan Trausan-Matu
Proceedings of the 1st Workshop on Representation Learning for NLP