Daniela Gifu

Also published as: Daniela Gîfu

2022

pdf abs
FII UAIC at SemEval-2022 Task 6: iSarcasmEval - Intended Sarcasm Detection in English and Arabic
Tudor Manoleasa | Daniela Gifu | Iustin Sandu
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

The “iSarcasmEval - Intended Sarcasm Detection in English and Arabic” task at the SemEval 2022 competition focuses on detectingand rating the distinction between intendedand perceived sarcasm in the context of textual sarcasm detection, as well as the level ofirony contained in these texts. In the contextof SemEval, we present a binary classificationmethod which classifies the text as sarcasticor non-sarcastic (task A, for English) based onfive classical machine learning approaches bytrying to train the models based on this datasetsolely (i.e., no other datasets have been used).This process indicates low performance compared to previously studied datasets, which in2dicates that the previous ones might be biased.

pdf abs
A Survey of Guidelines and Best Practices for the Generation, Interlinking, Publication, and Validation of Linguistic Linked Data
Fahad Khan | Christian Chiarcos | Thierry Declerck | Maria Pia Di Buono | Milan Dojchinovski | Jorge Gracia | Giedre Valunaite Oleskeviciene | Daniela Gifu
Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference

This article discusses a survey carried out within the NexusLinguarum COST Action which aimed to give an overview of existing guidelines (GLs) and best practices (BPs) in linguistic linked data. In particular it focused on four core tasks in the production/publication of linked data: generation, interlinking, publication, and validation. We discuss the importance of GLs and BPs for LLD before describing the survey and its results in full. Finally we offer a number of directions for future work in order to address the findings of the survey.

2021

pdf abs
FII_CROSS at SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation
Ciprian Bodnar | Andrada Tapuc | Cosmin Pintilie | Daniela Gifu | Diana Trandabat
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper presents a word-in-context disambiguation system. The task focuses on capturing the polysemous nature of words in a multilingual and cross-lingual setting, without considering a strict inventory of word meanings. The system applies Natural Language Processing algorithms on datasets from SemEval 2021 Task 2, being able to identify the meaning of words for the languages Arabic, Chinese, English, French and Russian, without making use of any additional mono- or multilingual resources.

pdf abs
FII FUNNY at SemEval-2021 Task 7: HaHackathon: Detecting and rating Humor and Offense
Mihai Samson | Daniela Gifu
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

The “HaHackathon: Detecting and Rating Humor and Offense” task at the SemEval 2021 competition focuses on detecting and rating the humor level in sentences, as well as the level of offensiveness contained in these texts with humoristic tones. In this paper, we present an approach based on recent Deep Learning techniques by both trying to train the models based on the dataset solely and by trying to fine-tune pre-trained models on the gigantic corpus.

2020

This paper describes the on-going work carried out within the CoBiLiRo (Bimodal Corpus for Romanian Language) research project, part of ReTeRom (Resources and Technologies for Developing Human-Machine Interfaces in Romanian). Data annotation finds increasing use in speech recognition and synthesis with the goal to support learning processes. In this context, a variety of different annotation systems for application to Speech and Text Processing environments have been presented. Even if many designs for the data annotations workflow have emerged, the process of handling metadata, to manage complex user-defined annotations, is not covered enough. We propose a design of the format aimed to serve as an annotation standard for bimodal resources, which facilitates searching, editing and statistical analysis operations over it. The design and implementation of an infrastructure that houses the resources are also presented. The goal is widening the dissemination of bimodal corpora for research valorisation and use in applications. Also, this study reports on the main operations of the web Platform which hosts the corpus and the automatic conversion flows that brings the submitted files at the format accepted by the Platform.

pdf abs
FII-UAIC at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text Using CNN
Lavinia Aparaschivei | Andrei Palihovici | Daniela Gîfu
Proceedings of the Fourteenth Workshop on Semantic Evaluation

The “Sentiment Analysis for Code-Mixed Social Media Text” task at the SemEval 2020 competition focuses on sentiment analysis in code-mixed social media text , specifically, on the combination of English with Spanish (Spanglish) and Hindi (Hinglish). In this paper, we present a system able to classify tweets, from Spanish and English languages, into positive, negative and neutral. Firstly, we built a classifier able to provide corresponding sentiment labels. Besides the sentiment labels, we provide the language labels at the word level. Secondly, we generate a word-level representation, using Convolutional Neural Network (CNN) architecture. Our solution indicates promising results for the Sentimix Spanglish-English task (0.744), the team, Lavinia_Ap, occupied the 9th place. However, for the Sentimix Hindi-English task (0.324) the results have to be improved.

pdf abs
UAIC1860 at SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles
Vlad Ermurachi | Daniela Gifu
Proceedings of the Fourteenth Workshop on Semantic Evaluation

The “Detection of Propaganda Techniques in News Articles” task at the SemEval 2020 competition focuses on detecting and classifying propaganda, pervasive in news article. In this paper, we present a system able to evaluate on sentence level, three traditional text representation techniques for these study goals, using: tf*idf, word and character n-grams. Firstly, we built a binary classifier able to provide corresponding propaganda labels, propaganda or non-propaganda. Secondly, we build a multilabel multiclass model to identify applied propaganda.

pdf abs
A Real-Time System for Credibility on Twitter
Adrian Iftene | Daniela Gifu | Andrei-Remus Miron | Mihai-Stefan Dudu
Proceedings of the Twelfth Language Resources and Evaluation Conference

Nowadays, social media credibility is a pressing issue for each of us who are living in an altered online landscape. The speed of news diffusion is striking. Given the popularity of social networks, more and more users began posting pictures, information, and news about personal life. At the same time, they started to use all this information to get informed about what their friends do or what is happening in the world, many of them arousing much suspicion. The problem we are currently experiencing is that we do not currently have an automatic method of figuring out in real-time which news or which users are credible and which are not, what is false or what is true on the Internet. The goal of this is to analyze Twitter in real-time using neural networks in order to provide us key elements about both the credibility of tweets and users who posted them. Thus, we make a real-time heatmap using information gathered from users to create overall images of the areas from which this fake news comes.

2019

pdf abs
Hope at SemEval-2019 Task 6: Mining social media language to discover offensive language
Gabriel Florentin Patras | Diana Florina Lungu | Daniela Gifu | Diana Trandabat
Proceedings of the 13th International Workshop on Semantic Evaluation

User’s content share through social media has reached huge proportions nowadays. However, along with the free expression of thoughts on social media, people risk getting exposed to various aggressive statements. In this paper, we present a system able to identify and classify offensive user-generated content.

2018

pdf abs
EmoIntens Tracker at SemEval-2018 Task 1: Emotional Intensity Levels in #Tweets
Ramona-Andreea Turcu | Sandra Maria Amarandei | Iuliana-Alexandra Flescan-Lovin-Arseni | Daniela Gifu | Diana Trandabat
Proceedings of the 12th International Workshop on Semantic Evaluation

The „Affect in Tweets” task is centered on emotions categorization and evaluation matrix using multi-language tweets (English and Spanish). In this research, SemEval Affect dataset was preprocessed, categorized, and evaluated accordingly (precision, recall, and accuracy). The system described in this paper is based on the implementation of supervised machine learning (Naive Bayes, KNN and SVM), deep learning (NN Tensor Flow model), and decision trees algorithms.

pdf abs
The Dabblers at SemEval-2018 Task 2: Multilingual Emoji Prediction
Larisa Alexa | Alina Lorenț | Daniela Gîfu | Diana Trandabăț
Proceedings of the 12th International Workshop on Semantic Evaluation

The “Multilingual Emoji Prediction” task focuses on the ability of predicting the correspondent emoji for a certain tweet. In this paper, we investigate the relation between words and emojis. In order to do that, we used supervised machine learning (Naive Bayes) and deep learning (Recursive Neural Network).

pdf abs
Apollo at SemEval-2018 Task 9: Detecting Hypernymy Relations Using Syntactic Dependencies
Mihaela Onofrei | Ionuț Hulub | Diana Trandabăț | Daniela Gîfu
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper presents the participation of Apollo’s team in the SemEval-2018 Task 9 “Hypernym Discovery”, Subtask 1: “General-Purpose Hypernym Discovery”, which tries to produce a ranked list of hypernyms for a specific term. We propose a novel approach for automatic extraction of hypernymy relations from a corpus by using dependency patterns. We estimated that the application of these patterns leads to a higher score than using the traditional lexical patterns.

2014

pdf abs
Transliteration and alignment of parallel texts from Cyrillic to Latin
Mircea Petic | Daniela Gîfu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article describes a methodology of recovering and preservation of old Romanian texts and problems related to their recognition. Our focus is to create a gold corpus for Romanian language (the novella Sania), for both alphabets used in Transnistria ― Cyrillic and Latin. The resource is available for similar researches. This technology is based on transliteration and semiautomatic alignment of parallel texts at the level of letter/lexem/multiwords. We have analysed every text segment present in this corpus and discovered other conventions of writing at the level of transliteration, academic norms and editorial interventions. These conventions allowed us to elaborate and implement some new heuristics that make a correct automatic transliteration process. Sometimes the words of Latin script are modified in Cyrillic script from semantic reasons (for instance, editor’s interpretation). Semantic transliteration is seen as a good practice in introducing multiwords from Cyrillic to Latin. Not only does it preserve how a multiwords sound in the source script, but also enables the translator to modify in the original text (here, choosing the most common sense of an expression). Such a technology could be of interest to lexicographers, but also to specialists in computational linguistics to improve the actual transliteration standards.