Daniela Gifu

Also published as: Daniela Gîfu


2021

pdf bib
FII_CROSS at SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation
Ciprian Bodnar | Andrada Tapuc | Cosmin Pintilie | Daniela Gifu | Diana Trandabat
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper presents a word-in-context disambiguation system. The task focuses on capturing the polysemous nature of words in a multilingual and cross-lingual setting, without considering a strict inventory of word meanings. The system applies Natural Language Processing algorithms on datasets from SemEval 2021 Task 2, being able to identify the meaning of words for the languages Arabic, Chinese, English, French and Russian, without making use of any additional mono- or multilingual resources.

pdf bib
FII FUNNY at SemEval-2021 Task 7: HaHackathon: Detecting and rating Humor and Offense
Mihai Samson | Daniela Gifu
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

The “HaHackathon: Detecting and Rating Humor and Offense” task at the SemEval 2021 competition focuses on detecting and rating the humor level in sentences, as well as the level of offensiveness contained in these texts with humoristic tones. In this paper, we present an approach based on recent Deep Learning techniques by both trying to train the models based on the dataset solely and by trying to fine-tune pre-trained models on the gigantic corpus.

2020

pdf bib
FII-UAIC at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text Using CNN
Lavinia Aparaschivei | Andrei Palihovici | Daniela Gîfu
Proceedings of the Fourteenth Workshop on Semantic Evaluation

The “Sentiment Analysis for Code-Mixed Social Media Text” task at the SemEval 2020 competition focuses on sentiment analysis in code-mixed social media text , specifically, on the combination of English with Spanish (Spanglish) and Hindi (Hinglish). In this paper, we present a system able to classify tweets, from Spanish and English languages, into positive, negative and neutral. Firstly, we built a classifier able to provide corresponding sentiment labels. Besides the sentiment labels, we provide the language labels at the word level. Secondly, we generate a word-level representation, using Convolutional Neural Network (CNN) architecture. Our solution indicates promising results for the Sentimix Spanglish-English task (0.744), the team, Lavinia_Ap, occupied the 9th place. However, for the Sentimix Hindi-English task (0.324) the results have to be improved.

pdf bib
UAIC1860 at SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles
Vlad Ermurachi | Daniela Gifu
Proceedings of the Fourteenth Workshop on Semantic Evaluation

The “Detection of Propaganda Techniques in News Articles” task at the SemEval 2020 competition focuses on detecting and classifying propaganda, pervasive in news article. In this paper, we present a system able to evaluate on sentence level, three traditional text representation techniques for these study goals, using: tf*idf, word and character n-grams. Firstly, we built a binary classifier able to provide corresponding propaganda labels, propaganda or non-propaganda. Secondly, we build a multilabel multiclass model to identify applied propaganda.

pdf bib
CoBiLiRo: A Research Platform for Bimodal Corpora
Dan Cristea | Ionuț Pistol | Șerban Boghiu | Anca-Diana Bibiri | Daniela Gîfu | Andrei Scutelnicu | Mihaela Onofrei | Diana Trandabăț | George Bugeag
Proceedings of the 1st International Workshop on Language Technology Platforms

This paper describes the on-going work carried out within the CoBiLiRo (Bimodal Corpus for Romanian Language) research project, part of ReTeRom (Resources and Technologies for Developing Human-Machine Interfaces in Romanian). Data annotation finds increasing use in speech recognition and synthesis with the goal to support learning processes. In this context, a variety of different annotation systems for application to Speech and Text Processing environments have been presented. Even if many designs for the data annotations workflow have emerged, the process of handling metadata, to manage complex user-defined annotations, is not covered enough. We propose a design of the format aimed to serve as an annotation standard for bimodal resources, which facilitates searching, editing and statistical analysis operations over it. The design and implementation of an infrastructure that houses the resources are also presented. The goal is widening the dissemination of bimodal corpora for research valorisation and use in applications. Also, this study reports on the main operations of the web Platform which hosts the corpus and the automatic conversion flows that brings the submitted files at the format accepted by the Platform.

pdf bib
A Real-Time System for Credibility on Twitter
Adrian Iftene | Daniela Gifu | Andrei-Remus Miron | Mihai-Stefan Dudu
Proceedings of the 12th Language Resources and Evaluation Conference

Nowadays, social media credibility is a pressing issue for each of us who are living in an altered online landscape. The speed of news diffusion is striking. Given the popularity of social networks, more and more users began posting pictures, information, and news about personal life. At the same time, they started to use all this information to get informed about what their friends do or what is happening in the world, many of them arousing much suspicion. The problem we are currently experiencing is that we do not currently have an automatic method of figuring out in real-time which news or which users are credible and which are not, what is false or what is true on the Internet. The goal of this is to analyze Twitter in real-time using neural networks in order to provide us key elements about both the credibility of tweets and users who posted them. Thus, we make a real-time heatmap using information gathered from users to create overall images of the areas from which this fake news comes.

2019

pdf bib
Hope at SemEval-2019 Task 6: Mining social media language to discover offensive language
Gabriel Florentin Patras | Diana Florina Lungu | Daniela Gifu | Diana Trandabat
Proceedings of the 13th International Workshop on Semantic Evaluation

User’s content share through social media has reached huge proportions nowadays. However, along with the free expression of thoughts on social media, people risk getting exposed to various aggressive statements. In this paper, we present a system able to identify and classify offensive user-generated content.

2018

pdf bib
EmoIntens Tracker at SemEval-2018 Task 1: Emotional Intensity Levels in #Tweets
Ramona-Andreea Turcu | Sandra Maria Amarandei | Iuliana-Alexandra Flescan-Lovin-Arseni | Daniela Gifu | Diana Trandabat
Proceedings of The 12th International Workshop on Semantic Evaluation

The „Affect in Tweets” task is centered on emotions categorization and evaluation matrix using multi-language tweets (English and Spanish). In this research, SemEval Affect dataset was preprocessed, categorized, and evaluated accordingly (precision, recall, and accuracy). The system described in this paper is based on the implementation of supervised machine learning (Naive Bayes, KNN and SVM), deep learning (NN Tensor Flow model), and decision trees algorithms.

pdf bib
The Dabblers at SemEval-2018 Task 2: Multilingual Emoji Prediction
Larisa Alexa | Alina Lorenț | Daniela Gîfu | Diana Trandabăț
Proceedings of The 12th International Workshop on Semantic Evaluation

The “Multilingual Emoji Prediction” task focuses on the ability of predicting the correspondent emoji for a certain tweet. In this paper, we investigate the relation between words and emojis. In order to do that, we used supervised machine learning (Naive Bayes) and deep learning (Recursive Neural Network).

pdf bib
Apollo at SemEval-2018 Task 9: Detecting Hypernymy Relations Using Syntactic Dependencies
Mihaela Onofrei | Ionuț Hulub | Diana Trandabăț | Daniela Gîfu
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper presents the participation of Apollo’s team in the SemEval-2018 Task 9 “Hypernym Discovery”, Subtask 1: “General-Purpose Hypernym Discovery”, which tries to produce a ranked list of hypernyms for a specific term. We propose a novel approach for automatic extraction of hypernymy relations from a corpus by using dependency patterns. We estimated that the application of these patterns leads to a higher score than using the traditional lexical patterns.

2014

pdf bib
Transliteration and alignment of parallel texts from Cyrillic to Latin
Mircea Petic | Daniela Gîfu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article describes a methodology of recovering and preservation of old Romanian texts and problems related to their recognition. Our focus is to create a gold corpus for Romanian language (the novella Sania), for both alphabets used in Transnistria ― Cyrillic and Latin. The resource is available for similar researches. This technology is based on transliteration and semiautomatic alignment of parallel texts at the level of letter/lexem/multiwords. We have analysed every text segment present in this corpus and discovered other conventions of writing at the level of transliteration, academic norms and editorial interventions. These conventions allowed us to elaborate and implement some new heuristics that make a correct automatic transliteration process. Sometimes the words of Latin script are modified in Cyrillic script from semantic reasons (for instance, editor’s interpretation). Semantic transliteration is seen as a good practice in introducing multiwords from Cyrillic to Latin. Not only does it preserve how a multiwords sound in the source script, but also enables the translator to modify in the original text (here, choosing the most common sense of an expression). Such a technology could be of interest to lexicographers, but also to specialists in computational linguistics to improve the actual transliteration standards.