We present the SemEval-2018 Task 1: Affect in Tweets, which includes an array of subtasks on inferring the affectual state of a person from their tweet. For each task, we created labeled data from English, Arabic, and Spanish tweets. The individual tasks are: 1. emotion intensity regression, 2. emotion intensity ordinal classification, 3. valence (sentiment) regression, 4. valence ordinal classification, and 5. emotion classification. Seventy-five teams (about 200 team members) participated in the shared task. We summarize the methods, resources, and tools used by the participating teams, with a focus on the techniques and resources that are particularly useful. We also analyze systems for consistent bias towards a particular race or gender. The data is made freely available to further improve our understanding of how people convey emotions through language.
The paper describes the best performing system for the SemEval-2018 Affect in Tweets(English) sub-tasks. The system focuses on the ordinal classification and regression sub-tasks for valence and emotion. For ordinal classification valence is classified into 7 different classes ranging from -3 to 3 whereas emotion is classified into 4 different classes 0 to 3 separately for each emotion namely anger, fear, joy and sadness. The regression sub-tasks estimate the intensity of valence and each emotion. The system performs domain adaptation of 4 different models and creates an ensemble to give the final prediction. The proposed system achieved 1stposition out of 75 teams which participated in the fore-mentioned sub-tasks. We outperform the baseline model by margins ranging from 49.2% to 76.4 %, thus, pushing the state-of-the-art significantly.
This paper describes the results of the first Shared Task on Multilingual Emoji Prediction, organized as part of SemEval 2018. Given the text of a tweet, the task consists of predicting the most likely emoji to be used along such tweet. Two subtasks were proposed, one for English and one for Spanish, and participants were allowed to submit a system run to one or both subtasks. In total, 49 teams participated to the English subtask and 22 teams submitted a system run to the Spanish subtask. Evaluation was carried out emoji-wise, and the final ranking was based on macro F-Score. Data and further information about this task can be found at https://competitions.codalab.org/competitions/17344.
This paper describes our participation in the SemEval-2018 task Multilingual Emoji Prediction. We participated in both English and Spanish subtasks, experimenting with support vector machines (SVMs) and recurrent neural networks. Our SVM classifier obtained the top rank in both subtasks with macro-averaged F1-measures of 35.99% for English and 22.36% for Spanish data sets. Similar to a few earlier attempts, the results with neural networks were not on par with linear SVMs.
This paper presents the first shared task on irony detection: given a tweet, automatic natural language processing systems should determine whether the tweet is ironic (Task A) and which type of irony (if any) is expressed (Task B). The ironic tweets were collected using irony-related hashtags (i.e. #irony, #sarcasm, #not) and were subsequently manually annotated to minimise the amount of noise in the corpus. Prior to distributing the data, hashtags that were used to collect the tweets were removed from the corpus. For both tasks, a training corpus of 3,834 tweets was provided, as well as a test set containing 784 tweets. Our shared tasks received submissions from 43 teams for the binary classification Task A and from 31 teams for the multiclass Task B. The highest classification scores obtained for both subtasks are respectively F1= 0.71 and F1= 0.51 and demonstrate that fine-grained irony classification is much more challenging than binary irony detection.
Detecting irony is an important task to mine fine-grained information from social web messages. Therefore, the Semeval-2018 task 3 is aimed to detect the ironic tweets (subtask A) and their ironic types (subtask B). In order to address this task, we propose a system based on a densely connected LSTM network with multi-task learning strategy. In our dense LSTM model, each layer will take all outputs from previous layers as input. The last LSTM layer will output the hidden representations of texts, and they will be used in three classification task. In addition, we incorporate several types of features to improve the model performance. Our model achieved an F-score of 70.54 (ranked 2/43) in the subtask A and 49.47 (ranked 3/29) in the subtask B. The experimental results validate the effectiveness of our system.
Character identification is a task of entity linking that finds the global entity of each personal mention in multiparty dialogue. For this task, the first two seasons of the popular TV show Friends are annotated, comprising a total of 448 dialogues, 15,709 mentions, and 401 entities. The personal mentions are detected from nominals referring to certain characters in the show, and the entities are collected from the list of all characters in those two seasons of the show. This task is challenging because it requires the identification of characters that are mentioned but may not be active during the conversation. Among 90+ participants, four of them submitted their system outputs and showed strengths in different aspects about the task. Thorough analyses of the distributed datasets, system outputs, and comparative studies are also provided. To facilitate the momentum, we create an open-source project for this task and publicly release a larger and cleaner dataset, hoping to support researchers for more enhanced modeling.
This paper describes our winning contribution to SemEval 2018 Task 4: Character Identification on Multiparty Dialogues. It is a simple, standard model with one key innovation, an entity library. Our results show that this innovation greatly facilitates the identification of infrequent characters. Because of the generic nature of our model, this finding is potentially relevant to any task that requires the effective learning from sparse or imbalanced data.
This paper discusses SemEval-2018 Task 5: a referential quantification task of counting events and participants in local, long-tail news documents with high ambiguity. The complexity of this task challenges systems to establish the meaning, reference and identity across documents. The task consists of three subtasks and spans across three domains. We detail the design of this referential quantification task, describe the participating systems, and present additional analysis to gain deeper insight into their performance.
We present KOI (Knowledge of Incidents), a system that given news articles as input, builds a knowledge graph (KOI-KG) of incidental events. KOI-KG can then be used to efficiently answer questions such “How many killing incidents happened in 2017 that involve Sean?” The required steps in building the KG include: (i) document preprocessing involving word sense disambiguation, named-entity recognition, temporal expression recognition and normalization, and semantic role labeling; (ii) incidental event extraction and coreference resolution via document clustering; and (iii) KG construction and population.
This paper presents the outcomes of the Parsing Time Normalization shared task held within SemEval-2018. The aim of the task is to parse time expressions into the compositional semantic graphs of the Semantically Compositional Annotation of Time Expressions (SCATE) schema, which allows the representation of a wider variety of time expressions than previous approaches. Two tracks were included, one to evaluate the parsing of individual components of the produced graphs, in a classic information extraction way, and another one to evaluate the quality of the time intervals resulting from the interpretation of those graphs. Though 40 participants registered for the task, only one team submitted output, achieving 0.55 F1 in Track 1 (parsing) and 0.70 F1 in Track 2 (intervals).
Temporal information extraction is a challenging task. Here we describe Chrono, a hybrid rule-based and machine learning system that identifies temporal expressions in text and normalizes them into the SCATE schema. After minor parsing logic adjustments, Chrono has emerged as the top performing system for SemEval 2018 Task 6: Parsing Time Normalizations.
This paper describes the NeuroSent system that participated in SemEval 2018 Task 1. Our system takes a supervised approach that builds on neural networks and word embeddings. Word embeddings were built by starting from a repository of user generated reviews. Thus, they are specific for sentiment analysis tasks. Then, tweets are converted in the corresponding vector representation and given as input to the neural network with the aim of learning the different semantics contained in each emotion taken into account by the SemEval task. The output layer has been adapted based on the characteristics of each subtask. Preliminary results obtained on the provided training set are encouraging for pursuing the investigation into this direction.
This paper describes the system used and results obtained for team FOI DSS at SemEval-2018 Task 1: Affect In Tweets. The team participated in all English language subtasks, with a method utilizing transfer learning from LSTM nets trained on large sentiment datasets combined with embeddings and lexical features. For four out of five subtasks, the system performed in the range of 92-95% of the winning systems, in terms of the competition metrics. Analysis of the results suggests that improved pre-processing and addition of more lexical features may further elevate performance.
In this paper, we put forward a system that competed at SemEval-2018 Task 1: “Affect in Tweets”. Our system uses a simple yet effective ensemble method which combines several neural network components. We participate in two subtasks for English tweets: EI-reg and V-reg. For two subtasks, different combinations of neural components are examined. For EI-reg, our system achieves an accuracy of 0.727 in Pearson Correlation Coefficient (all instances) and an accuracy of 0.555 in Pearson Correlation Coefficient (0.5-1). For V-reg, the achieved accuracy scores are respectively 0.835 and 0.670
This paper presents an emotion classification system for English tweets, submitted for the SemEval shared task on Affect in Tweets, subtask 5: Detecting Emotions. The system combines lexicon, n-gram, style, syntactic and semantic features. For this multi-class multi-label problem, we created a classifier chain. This is an ensemble of eleven binary classifiers, one for each possible emotion category, where each model gets the predictions of the preceding models as additional features. The predicted labels are combined to get a multi-label representation of the predictions. Our system was ranked eleventh among thirty five participating teams, with a Jaccard accuracy of 52.0% and macro- and micro-average F1-scores of 49.3% and 64.0%, respectively.
Emotion classification is a new task that combines several disciplines including Artificial Intelligence and Psychology, although Natural Language Processing is perhaps the most challenging area. In this paper, we describe our participation in SemEval-2018 Task1: Affect in Tweets. In particular, we have participated in EI-oc, EI-reg and E-c subtasks for English and Spanish languages.
This paper describes our system created for the SemEval-2018 Task 1: Affect in Tweets (AIT-2018). We participated in both the regression and the ordinal classification subtasks for emotion intensity detection in English, Arabic, and Spanish. For the regression subtask we use the AffectiveTweets system with added features using various word embeddings, lexicons, and LDA. For the ordinal classification we additionally use our Brainy system with features using parse tree, POS tags, and morphological features. The most beneficial features apart from word and character n-grams include word embeddings, POS count and morphological features.
In this paper, we propose an attention-based classifier that predicts multiple emotions of a given sentence. Our model imitates human’s two-step procedure of sentence understanding and it can effectively represent and classify sentences. With emoji-to-meaning preprocessing and extra lexicon utilization, we further improve the model performance. We train and evaluate our model with data provided by SemEval-2018 task 1-5, each sentence of which has several labels among 11 given emotions. Our model achieves 5th/1st rank in English/Spanish respectively.
This paper describes our participation in Affective Tweets task for emotional intensity and sentiment intensity subtasks for English, Spanish, and Arabic languages. We used two approaches, μTC and EvoMSA. The first one is a generic text categorization and regression system; and the second one, a two-stage architecture for Sentiment Analysis. Both approaches are multilingual and domain independent.
In this paper we present our system for detecting valence task. The major issue was to apply a state-of-the-art system despite the small dataset provided: the system would quickly overfit. The main idea of our proposal is to use transfer learning, which allows to avoid learning from scratch. Indeed, we start to train a first model to predict if a tweet is positive, negative or neutral. For this we use an external dataset which is larger and similar to the target dataset. Then, the pre-trained model is re-used as the starting point to train a new model that classifies a tweet into one of the seven various levels of sentiment intensity. Our system, trained using transfer learning, achieves 0.776 and 0.763 respectively for Pearson correlation coefficient and weighted quadratic kappa metrics on the subtask evaluation dataset.
This paper describes our approach to SemEval-2018 Task1: Estimation of Affects in Tweet for 1a and 2a. Our team KDE-AFFECT employs several methods including one-dimensional Convolutional Neural Network for n-grams, together with word embedding and other preprocessing such as vocabulary unification and Emoji conversion into four emotional words.
Written communication lacks the multimodal features such as posture, gesture and gaze that make it easy to model affective states. Especially in social media such as Twitter, due to the space constraints, the sources of information that can be mined are even more limited due to character limitations. These limitations constitute a challenge for understanding short social media posts. In this paper, we present an approach that utilizes multiple binary classifiers that represent different affective categories to model Twitter posts (e.g., tweets). We train domain-independent recurrent neural network models without any outside information such as affect lexicons. We then use these domain independent binary ranking models to evaluate the applicability of such deep learning models on the affect identification task. This approach allows different model architectures and parameter settings for each affect category instead of building one single multi-label classifier. The contributions of this paper are two-folds: we show that modeling tweets with a small training set is possible with the use of RNNs and we also prove that formulating affect identification as a binary classification task is highly effective.
In this paper, we describe our contribution in SemEval-2018 contest. We tackled task 1 “Affect in Tweets”, subtask E-c “Detecting Emotions (multi-label classification)”. A multilabel classification system Tw-StAR was developed to recognize the emotions embedded in Arabic, English and Spanish tweets. To handle the multi-label classification problem via traditional classifiers, we employed the binary relevance transformation strategy while a TF-IDF scheme was used to generate the tweets’ features. We investigated using single and combinations of several preprocessing tasks to further improve the performance. The results showed that specific combinations of preprocessing tasks could significantly improve the evaluation measures. This has been later emphasized by the official results as our system ranked 3rd for both Arabic and Spanish datasets and 14th for the English dataset.
The paper describes our approach for SemEval-2018 Task 1: Affect Detection in Tweets. We perform experiments with manually compelled sentiment lexicons and word embeddings. We test their performance on twitter affect detection task to determine which features produce the most informative representation of a sentence. We demonstrate that general-purpose word embeddings produces more informative sentence representation than lexicon features. However, combining lexicon features with embeddings yields higher performance than embeddings alone.
The „Affect in Tweets” task is centered on emotions categorization and evaluation matrix using multi-language tweets (English and Spanish). In this research, SemEval Affect dataset was preprocessed, categorized, and evaluated accordingly (precision, recall, and accuracy). The system described in this paper is based on the implementation of supervised machine learning (Naive Bayes, KNN and SVM), deep learning (NN Tensor Flow model), and decision trees algorithms.
We propose a novel attentive hybrid GRU-based network (SAHGN), which we used at SemEval-2018 Task 1: Affect in Tweets. Our network has two main characteristics, 1) has the ability to internally optimize its feature representation using attention mechanisms, and 2) provides a hybrid representation using a character level Convolutional Neural Network (CNN), as well as a self-attentive word-level encoder. The key advantage of our model is its ability to signify the relevant and important information that enables self-optimization. Results are reported on the valence intensity regression task.
Traditional sentiment analysis approaches mainly focus on classifying the sentiment polarities or emotion categories of texts. However, they can’t exploit the sentiment intensity information. Therefore, the SemEval-2018 Task 1 is aimed to automatically determine the intensity of emotions or sentiment of tweets to mine fine-grained sentiment information. In order to address this task, we propose a system based on an attention CNN-LSTM model. In our model, LSTM is used to extract the long-term contextual information from texts. We apply attention techniques to selecting this information. A CNN layer with different size of kernels is used to extract local features. The dense layers take the pooled CNN feature maps and predict the intensity scores. Our system reaches average Pearson correlation score of 0.722 (ranked 12/48) in emotion intensity regression task, and 0.810 in valence regression task (ranked 15/38). It indicates that our system can be further extended.
This paper describes our system that has been used in Task1 Affect in Tweets. We combine two different approaches. The first one called N-Stream ConvNets, which is a deep learning approach where the second one is XGboost regressor based on a set of embedding and lexicons based features. Our system was evaluated on the testing sets of the tasks outperforming all other approaches for the Arabic version of valence intensity regression task and valence ordinal classification task.
In this paper we present our contribution to SemEval-2018, a classifier for classifying multi-label emotions of Arabic and English tweets. We attempted “Affect in Tweets”, specifically Task E-c: Detecting Emotions (multi-label classification). Our method is based on preprocessing the tweets and creating word vectors combined with a self correction step to remove noise. We also make use of emotion specific thresholds. The final submission was selected upon the best performance achieved, selected when using a range of thresholds. Our system was evaluated on the Arabic and English datasets provided for the task by the competition organisers, where it ranked 2nd for the Arabic dataset (out of 14 entries) and 12th for the English dataset (out of 35 entries).
In this paper we describe our submission to SemEval-2018 Task 1: Affects in Tweets. The model which we present is an ensemble of various neural architectures and gradient boosted trees, and employs three different types of vectorial tweet representations. Furthermore, our system is language-independent and ranked first in 5 out of the 12 subtasks in which we participated, while achieving competitive results in the remaining ones. Comparatively remarkable performance is observed on both the Arabic and Spanish languages.
This paper describes the participation of Amobee in the shared sentiment analysis task at SemEval 2018. We participated in all the English sub-tasks and the Spanish valence tasks. Our system consists of three parts: training task-specific word embeddings, training a model consisting of gated-recurrent-units (GRU) with a convolution neural network (CNN) attention mechanism and training stacking-based ensembles for each of the sub-tasks. Our algorithm reached the 3rd and 1st places in the valence ordinal classification sub-tasks in English and Spanish, respectively.
This paper describes our system implementation for subtask V-oc of SemEval-2018 Task 1: affect in tweets. We use multi-task learning method to learn shared representation, then learn the features for each task. There are five classification models in the proposed multi-task learning approach. These classification models are trained sequentially to learn different features for different classification tasks. In addition to the data released for SemEval-2018, we use datasets from previous SemEvals during system construction. Our Pearson correlation score is 0.638 on the official SemEval-2018 Task 1 test set.
This paper describes our submissions to SemEval 2018 task 1. The task is affect intensity prediction in tweets, including five subtasks. We participated in all subtasks of English tweets. We extracted several traditional NLP, sentiment lexicon, emotion lexicon and domain specific features from tweets, adopted supervised machine learning algorithms to perform emotion intensity prediction.
While significant progress has been achieved for Opinion Mining in Arabic (OMA), very limited efforts have been put towards the task of Emotion mining in Arabic. In fact, businesses are interested in learning a fine-grained representation of how users are feeling towards their products or services. In this work, we describe the methods used by the team Emotion Mining in Arabic (EMA), as part of the SemEval-2018 Task 1 for Affect Mining for Arabic tweets. EMA participated in all 5 subtasks. For the five tasks, several preprocessing steps were evaluated and eventually the best system included diacritics removal, elongation adjustment, replacement of emojis by the corresponding Arabic word, character normalization and light stemming. Moreover, several features were evaluated along with different classification and regression techniques. For the 5 subtasks, word embeddings feature turned out to perform best along with Ensemble technique. EMA achieved the 1st place in subtask 5, and 3rd place in subtasks 1 and 3.
In this paper we present deep-learning models that submitted to the SemEval-2018 Task 1 competition: “Affect in Tweets”. We participated in all subtasks for English tweets. We propose a Bi-LSTM architecture equipped with a multi-layer self attention mechanism. The attention mechanism improves the model performance and allows us to identify salient words in tweets, as well as gain insight into the models making them more interpretable. Our model utilizes a set of word2vec word embeddings trained on a large collection of 550 million Twitter messages, augmented by a set of word affective features. Due to the limited amount of task-specific training data, we opted for a transfer learning approach by pretraining the Bi-LSTMs on the dataset of Semeval 2017, Task 4A. The proposed approach ranked 1st in Subtask E “Multi-Label Emotion Classification”, 2nd in Subtask A “Emotion Intensity Regression” and achieved competitive results in other subtasks.
While sentiment and emotion analysis has received a considerable amount of research attention, the notion of understanding and detecting the intensity of emotions is relatively less explored. This paper describes a system developed for predicting emotion intensity in tweets. Given a Twitter message, CrystalFeel uses features derived from parts-of-speech, n-grams, word embedding, and multiple affective lexicons including Opinion Lexicon, SentiStrength, AFFIN, NRC Emotion & Hash Emotion, and our in-house developed EI Lexicons to predict the degree of the intensity associated with fear, anger, sadness, and joy in the tweet. We found that including the affective lexicons-based features allowed the system to obtain strong prediction performance, while revealing interesting emotion word-level and message-level associations. On gold test data, CrystalFeel obtained Pearson correlations of 0.717 on average emotion intensity and of 0.816 on sentiment intensity.
This paper describes our system that has been submitted to SemEval-2018 Task 1: Affect in Tweets (AIT) to solve five subtasks. We focus on modeling both sentence and word level representations of emotion inside texts through large distantly labeled corpora with emojis and hashtags. We transfer the emotional knowledge by exploiting neural network models as feature extractors and use these representations for traditional machine learning models such as support vector regression (SVR) and logistic regression to solve the competition tasks. Our system is placed among the Top3 for all subtasks we participated.
We implemented the sentiment system in all five subtasks for English and Spanish. All subtasks involve emotion or sentiment intensity prediction (regression and ordinal classification) and emotions determining (multi-labels classification). The useful BiLSTM (Bidirectional Long-Short Term Memory) model with attention mechanism was mainly applied for our system. We use BiLSTM in order to get word information extracted from both directions. The attention mechanism was used to find the contribution of each word for improving the scores. Furthermore, based on BiLSTMATT (BiLSTM with attention mechanism) a few deep-learning algorithms were employed for different subtasks. For regression and ordinal classification tasks we used domain adaptation and ensemble learning methods to leverage base model. While a single base model was used for multi-labels task.
The present study describes our submission to SemEval 2018 Task 1: Affect in Tweets. Our Spanish-only approach aimed to demonstrate that it is beneficial to automatically generate additional training data by (i) translating training data from other languages and (ii) applying a semi-supervised learning method. We find strong support for both approaches, with those models outperforming our regular models in all subtasks. However, creating a stepwise ensemble of different models as opposed to simply averaging did not result in an increase in performance. We placed second (EI-Reg), second (EI-Oc), fourth (V-Reg) and fifth (V-Oc) in the four Spanish subtasks we participated in.
This paper presents a UIR-Miner system for emotion and sentiment analysis evaluation in Twitter in SemEval 2018. Our system consists of three main modules: preprocessing module, stacking module to solve the intensity prediction of emotion and sentiment, LSTM network module to solve multi-label classification, and the hierarchical attention network module for solving emotion and sentiment classification problem. According to the metrics of SemEval 2018, our system gets the final scores of 0.636, 0.531, 0.731, 0.708, and 0.408 on 5 subtasks, respectively.
This paper presents system description of our submission to the SemEval-2018 task-1: Affect in tweets for the English language. We combine three different features generated using deep learning models and traditional methods in support vector machines to create a unified ensemble system. A robust representation of a tweet is learned using a multi-attention based architecture which uses a mixture of different pre-trained embeddings. In addition to this analysis of different features is also presented. Our system ranked 2nd, 5th, and 7th in different subtasks among 75 teams.
This paper describes a system attended in the SemEval-2018 Task 1 “Affect in tweets” that predicts emotional intensities. We use Group LSTM with an attention model and transfer learning with sentiment classification data as a source data (SemEval 2017 Task 4a). A transfer model structure consists of a source domain and a target domain. Additionally, we try a new dropout that is applied to LSTMs in the Group LSTM. Our system ranked 8th at the subtask 1a (emotion intensity regression). We also show various results with different architectures in the source, target and transfer models.
In this paper, we propose a regression system to infer the emotion intensity of a tweet. We develop a multi-aspect feature learning mechanism to capture the most discriminative semantic features of a tweet as well as the emotion information conveyed by each word in it. We combine six types of feature groups: (1) a tweet representation learned by an LSTM deep neural network on the training data, (2) a tweet representation learned by an LSTM network on a large corpus of tweets that contain emotion words (a distant supervision corpus), (3) word embeddings trained on the distant supervision corpus and averaged over all words in a tweet, (4) word and character n-grams, (5) features derived from various sentiment and emotion lexicons, and (6) other hand-crafted features. As part of the word embedding training, we also learn the distributed representations of multi-word expressions (MWEs) and negated forms of words. An SVR regressor is then trained over the full set of features. We evaluate the effectiveness of our ensemble feature sets on the SemEval-2018 Task 1 datasets and achieve a Pearson correlation of 72% on the task of tweet emotion intensity prediction.
This paper presents a method for Affect in Tweets, which is the task to automatically determine the intensity of emotions and intensity of sentiment of tweets. The term affect refers to emotion-related categories such as anger, fear, etc. Intensity of emo-tions need to be quantified into a real valued score in [0, 1]. We propose an en-semble system including four different deep learning methods which are CNN, Bidirectional LSTM (BLSTM), LSTM-CNN and a CNN-based Attention model (CA). Our system gets an average Pearson correlation score of 0.682 in the subtask EI-reg and an average Pearson correlation score of 0.784 in subtask V-reg, which ranks 17th among 48 systems in EI-reg and 19th among 38 systems in V-reg.
In this paper we did an analysis of “Affects in Tweets” which was one of the task conducted by semeval 2018. Task was to build a model which is able to do regression and classification of different emotions from the given tweets data set. We developed a base model for all the subtasks using distributed representation (Doc2Vec) and applied machine learning techniques for classification and regression. Distributed representation is an unsupervised algorithm which is capable of learning fixed length feature representation from variable length texts. Machine learning techniques used for regression is ’Linear Regression’ while ’Random Forest Tree’ is used for classification purpose. Empirical results obtained for all the subtasks by our model are shown in this paper.
The system developed by the SSN MLRG1 team for Semeval-2018 task 1 on affect in tweets uses rule based feature selection and one-hot encoding to generate the input feature vector. Multilayer Perceptron was used to build the model for emotion intensity ordinal classification, sentiment analysis ordinal classification and emotion classfication subtasks. Support Vector Machine was used to build the model for emotion intensity regression and sentiment intensity regression subtasks.
This paper discusses on task 1, “Affect in Tweets” sharedtask, conducted in SemEval-2018. This task comprises of various subtasks, which required participants to analyse over different emotions and sentiments based on the provided tweet data and also measure the intensity of these emotions for subsequent subtasks. Our approach in these task was to come up with a model on count based representation and use machine learning techniques for regression and classification related tasks. In this work, we use a simple bag of words technique for supervised text classification model as to compare, that even with some advance distributed representation models we can still achieve significant accuracy. Further, fine tuning on various parameters for the bag of word, representation model we acquired better scores over various other baseline models (Vinayan et al.) participated in the sharedtask.
Emotions are a way of expressing human sentiments. In the modern era, social media is a platform where we convey our emotions. These emotions can be joy, anger, sadness and fear. Understanding the emotions from the written sentences is an interesting part in knowing about the writer. In the amount of digital language shared through social media, a considerable amount of data reflects the sentiment or emotion towards some product, person and organization. Since these texts are from users with diverse social aspects, these texts can be used to enrich the application related to the business intelligence. More than the sentiment, identification of intensity of the sentiment will enrich the performance of the end application. In this paper we experimented the intensity prediction as a text classification problem that evaluates the distributed representation text using aggregated sum and dimensionality reduction of the glove vectors of the words present in the respective texts .
This paper discusses the experiments performed for predicting the emotion intensity in tweets using a generalized supervised learning approach. We extract 3 kind of features from each of the tweets - one denoting the sentiment and emotion metrics obtained from different sentiment lexicons, one denoting the semantic representation of the word using dense representations like Glove, Word2vec and finally the syntactic information through POS N-grams, Word clusters, etc. We provide a comparative analysis of the significance of each of these features individually and in combination tested over standard regressors avaliable in scikit-learn. We apply an ensemble of these models to choose the best combination over cross validation.
This paper describes MuTuX, our system that is designed for task 1-5a, emotion classification analysis of tweets on SemEval2018. The system aims at exploring the potential of context information of terms for emotion analysis. A Recurrent Neural Network is adopted to capture the context information of terms in tweets. Only term features and the sequential relations are used in our system. The results submitted ranks 16th out of 35 systems on the task of emotion detection in English-language tweets.
Task 1 in the International Workshop SemEval 2018, Affect in Tweets, introduces five subtasks (El-reg, El-oc, V-reg, V-oc, and E-c) to detect the intensity of emotions in English, Arabic, and Spanish tweets. This paper describes TeamUNCC’s system to detect emotions in English and Arabic tweets. Our approach is novel in that we present the same architecture for all the five subtasks in both English and Arabic. The main input to the system is a combination of word2vec and doc2vec embeddings and a set of psycholinguistic features (e.g. from AffectTweets Weka-package). We apply a fully connected neural network architecture and obtain performance results that show substantial improvements in Spearman correlation scores over the baseline models provided by Task 1 organizers, (ranging from 0.03 to 0.23). TeamUNCC’s system ranks third in subtask El-oc and fourth in other subtasks for Arabic tweets.
We present our methods and results for affect analysis in Twitter developed as a part of SemEval-2018 Task 1, where the sub-tasks involve predicting the intensity of emotion, the intensity of sentiment, and valence for tweets. For modeling, though we use a traditional LSTM network, we combine our model with several state-of-the-art techniques to improve its performance in a low-resource setting. For example, we use an encoder-decoder network to initialize the LSTM weights. Without any task specific optimization we achieve competitive results (macro-average Pearson correlation coefficient 0.696) in the El-reg task. In this paper, we describe our development strategy in detail along with an exposition of our results.
This article describes our proposed Arabic Sentiment Analysis system named ARB-SEN. This system is designed for the International Workshop on Semantic Evaluation 2018 (SemEval-2018), Task1: Affect in Tweets. ARB-SEN proposes two supervised models to estimate the sentiment intensity in Arabic tweets. Both models use a set of features including sentiment lexicon, negation, word embedding and emotion symbols features. Our system combines these features to assist the sentiment analysis task. ARB-SEN system achieves a correlation score of 0.720, ranking 6th among all participants in the valence intensity regression (V-reg) for the Arabic sub-task organized within the SemEval 2018 evaluation campaign.
In this paper, we describe the first attempt to perform transfer learning from sentiment to emotions. Our system employs Long Short-Term Memory (LSTM) networks, including bidirectional LSTM (biLSTM) and LSTM with attention mechanism. We perform transfer learning by first pre-training the LSTM networks on sentiment data before concatenating the penultimate layers of these networks into a single vector as input to new dense layers. For the E-c subtask, we utilize a novel approach to train models for correlated emotion classes. Our system performs 4/48, 3/39, 8/38, 4/37, 4/35 on all English subtasks EI-reg, EI-oc, V-reg, V-oc, E-c of SemEval 2018 Task 1: Affect in Tweets.
Our submission to the SemEval-2018 Task1: Affect in Tweets shared task competition is a supervised learning model relying on standard lexicon features coupled with word embedding features. We used an ensemble of diverse models, including random forests, gradient boosted trees, and linear models, corrected for training-development set mismatch. We submitted the system’s output for subtasks 1 (emotion intensity prediction), 2 (emotion ordinal classification), 3 (valence intensity regression) and 4 (valence ordinal classification), for English tweets. We placed 25th, 19th, 24th and 15th in the four subtasks respectively. The baseline considered was an SVM (Support Vector Machines) model with linear kernel on the lexicon and embedding based features. Our system’s final performance measured in Pearson correlation scores outperformed the baseline by a margin of 2.2% to 14.6% across all tasks.
We take a multi-task learning approach to the shared Task 1 at SemEval-2018. The general idea concerning the model structure is to use as little external data as possible in order to preserve the task relatedness and reduce complexity. We employ multi-task learning with hard parameter sharing to exploit the relatedness between sub-tasks. As a base model, we use a standard recurrent neural network for both the classification and regression subtasks. Our system ranks 32nd out of 48 participants with a Pearson score of 0.557 in the first subtask, and 20th out of 35 in the fifth subtask with an accuracy score of 0.464.
This paper describes our system used in the English Emoji Prediction Task 2 at the SemEval-2018. Our system is based on two supervised machine learning algorithms: Gradient Boosting Regression Tree Method (GBM) and Bidirectional Long Short-term Memory Network (BLSTM). Besides the common features, we extract various lexicon and syntactic features from external resources. After comparing the results of two algorithms, GBM is chosen for the final evaluation.
This paper describes the UMDSub system that participated in Task 2 of SemEval-2018. We developed a system that predicts an emoji given the raw text in a English tweet. The system is a Multi-channel Convolutional Neural Network based on subword embeddings for the representation of tweets. This model improves on character or word based methods by about 2%. Our system placed 21st of 48 participating systems in the official evaluation.
In this paper, we present our system for assigning an emoji to a tweet based on the text. Each tweet was originally posted with an emoji which the task providers removed. Our task was to decide out of 20 emojis, which originally came with the tweet. Two datasets were provided - one in English and the other in Spanish. We treated the task as a standard classification task with the emojis as our classes and the tweets as our documents. Our best performing system used a Bag of Words model with a Linear Support Vector Machine as its’ classifier. We achieved a macro F1 score of 32.73% for the English data and 17.98% for the Spanish data.
The “Multilingual Emoji Prediction” task focuses on the ability of predicting the correspondent emoji for a certain tweet. In this paper, we investigate the relation between words and emojis. In order to do that, we used supervised machine learning (Naive Bayes) and deep learning (Recursive Neural Network).
Emojis are widely used by social media and social network users when posting their messages. It is important to study the relationships between messages and emojis. Thus, in SemEval-2018 Task 2 an interesting and challenging task is proposed, i.e., predicting which emojis are evoked by text-based tweets. We propose a residual CNN-LSTM with attention (RCLA) model for this task. Our model combines CNN and LSTM layers to capture both local and long-range contextual information for tweet representation. In addition, attention mechanism is used to select important components. Besides, residual connection is applied to CNN layers to facilitate the training of neural networks. We also incorporated additional features such as POS tags and sentiment features extracted from lexicons. Our model achieved 30.25% macro-averaged F-score in the first subtask (i.e., emoji prediction in English), ranking 7th out of 48 participants.
In this paper, we describe a methodology to predict emoji in tweets. Our approach is based on the classic bag-of-words model in conjunction with word embeddings. The used classification algorithm was Logistic Regression. This architecture was used and evaluated in the context of the SemEval 2018 challenge (task 2, subtask 1).
The advent of social media has brought along a novel way of communication where meaning is composed by combining short text messages and visual enhancements, the so-called emojis. We describe our system for participating in SemEval-2018 Task 2 on Multilingual Emoji Prediction. Our approach relies on combining a rich set of various types of features: semantic and metadata. The most important types turned out to be the metadata feature. In subtask 1: Emoji Prediction in English, our primary submission obtain a MAP of 16.45, Precision of 31.557, Recall of 16.771 and Accuracy of 30.992.
This paper presents our single model to Subtask 1 of SemEval 2018 Task 2: Emoji Prediction in English. In order to predict the emoji that may be contained in a tweet, the basic model we use is an attention-based recurrent neural network which has achieved satisfactory performs in Natural Language processing. Considering the text comes from social media, it contains many discrepant abbreviations and online terms, we also combine word-level and character-level word vector embedding to better handling the words not appear in the vocabulary. Our single model1 achieved 29.50% Macro F-score in test data and ranks 9th among 48 teams.
This paper describes our participation in SemEval 2018 Task 2: Multilingual Emoji Prediction, in which participants are asked to predict a tweet’s most associated emoji from 20 emojis. Instead of regarding it as a 20-class classification problem we regard it as a text similarity problem. We propose a vector similarity based approach for this task. First the distributed representation (tweet vector) for each tweet is generated, then the similarity between this tweet vector and each emoji’s embedding is evaluated. The most similar emoji is chosen as the predicted label. Experimental results show that our approach performs comparably with the classification approach and shows its advantage in classifying emojis with similar semantic meaning.
This paper describes our submissions to Task 2 in SemEval 2018, i.e., Multilingual Emoji Prediction. We first investigate several traditional Natural Language Processing (NLP) features, and then design several deep learning models. For subtask 1: Emoji Prediction in English, we combine two different methods to represent tweet, i.e., supervised model using traditional features and deep learning model. For subtask 2: Emoji Prediction in Spanish, we only use deep learning model.
In this paper we present a deep-learning model that competed at SemEval-2018 Task 2 “Multilingual Emoji Prediction”. We participated in subtask A, in which we are called to predict the most likely associated emoji in English tweets. The proposed architecture relies on a Long Short-Term Memory network, augmented with an attention mechanism, that conditions the weight of each word, on a “context vector” which is taken as the aggregation of a tweet’s meaning. Moreover, we initialize the embedding layer of our model, with word2vec word embeddings, pretrained on a dataset of 550 million English tweets. Finally, our model does not rely on hand-crafted features or lexicons and is trained end-to-end with back-propagation. We ranked 2nd out of 48 teams.
As part of a SemEval 2018 shared task an attempt was made to build a system capable of predicting the occurence of a language’s most frequently used emoji in Tweets. Specifically, models for English and Spanish data were created and trained on 500.000 and 100.000 tweets respectively. In order to create these models, first a logistic regressor, a sequential LSTM, a random forest regressor and a SVM were tested. The latter was found to perform best and therefore optimized individually for both languages. During developmet f1-scores of 61 and 82 were obtained for English and Spanish data respectively, in comparison, f1-scores on the official evaluation data were 21 and 18. The significant decrease in performance during evaluation might be explained by overfitting during development and might therefore have partially be prevented by using cross-validation. Over all, emoji which occur in a very specific context such as a Christmas tree were found to be most predictable.
This paper describes our approach, called EPUTION, for the open trial of the SemEval- 2018 Task 2, Multilingual Emoji Prediction. The task relates to using social media — more precisely, Twitter — with its aim to predict the most likely associated emoji of a tweet. Our solution for this text classification problem explores the idea of transfer learning for adapting the classifier based on users’ tweeting history. Our experiments show that our user-adaption method improves classification results by more than 6 per cent on the macro-averaged F1. Thus, our paper provides evidence for the rationality of enriching the original corpus longitudinally with user behaviors and transferring the lessons learned from corresponding users to specific instances.
We present a system for emoji prediction on English and Spanish tweets, prepared for the SemEval-2018 task on Multilingual Emoji Prediction. We compared the performance of an SVM, LSTM and an ensemble of these two. We found the SVM performed best on our development set with an accuracy of 61.3% for English and 83% for Spanish. The features used for the SVM are lowercased word n-grams in the range of 1 to 20, tokenised by a TweetTokenizer and stripped of stop words. On the test set, our model achieved an accuracy of 34% on English, with a slightly lower score of 29.7% accuracy on Spanish.
This paper describes our approach to SemEval-2018 Task 2, which aims to predict the most likely associated emoji, given a tweet in English or Spanish. We normalized text-based tweets during pre-processing, following which we utilized a bi-directional gated recurrent unit with an attention mechanism to build our base model. Multi-models with or without class weights were trained for the ensemble methods. We boosted models without class weights, and only strong boost classifiers were identified. In our system, not only was a boosting method used, but we also took advantage of the voting ensemble method to enhance our final system result. Our method demonstrated an obvious improvement of approximately 3% of the macro F1 score in English and 2% in Spanish.
This paper describes the approach that was developed for SemEval 2018 Task 2 (Multilingual Emoji Prediction) by the DUTH Team. First, we employed a combination of pre-processing techniques to reduce the noise of tweets and produce a number of features. Then, we built several N-grams, to represent the combination of word and emojis. Finally, we trained our system with a tuned LinearSVC classifier. Our approach in the leaderboard ranked 18th amongst 48 teams.
Emojis are widely used on social media andunderstanding their meaning is important forboth practical purposes (e.g. opinion mining,sentiment detection) and theoretical purposes(e.g. how different L1 speakers use them, dothey have some syntax?); this paper presents aset of experiments that aim to predict a singleemoji from a tweet. We built different mod-els and we found that the test results are verydifferent from the validation results.
In this paper, we present SyntNN as a way to include traditional syntactic models in multilayer neural networks used in the task of Semeval Task 2 of emoji prediction. The model builds on the distributed tree embedder also known as distributed tree kernel. Initial results are extremely encouraging but additional analysis is needed to overcome the problem of overfitting.
This paper describes the Duluth UROP systems that participated in SemEval–2018 Task 2, Multilingual Emoji Prediction. We relied on a variety of ensembles made up of classifiers using Naive Bayes, Logistic Regression, and Random Forests. We used unigram and bigram features and tried to offset the skewness of the data through the use of oversampling. Our task evaluation results place us 19th of 48 systems in the English evaluation, and 5th of 21 in the Spanish. After the evaluation we realized that some simple changes to our pre-processing could significantly improve our results. After making these changes we attained results that would have placed us sixth in the English evaluation, and second in the Spanish.
Emoji is one of the “fastest growing language ” in pop-culture, especially in social media and it is very unlikely for its usage to decrease. These are generally used to bring an extra level of meaning to the texts, posted on social media platforms. Providing such an added info, gives more insights to the plain text, arising to hidden interpretation within the text. This paper explains our analysis on Task 2, ” Multilingual Emoji Prediction” sharedtask conducted by Semeval-2018. In the task, a predicted emoji based on a piece of Twitter text are labelled under 20 different classes (most commonly used emojis) where these classes are learnt and further predicted are made for unseen Twitter text. In this work, we have experimented and analysed emojis predicted based on Twitter text, as a classification problem where the entailing emoji is considered as a label for every individual text data. We have implemented this using distributed representation of text through fastText. Also, we have made an effort to demonstrate how fastText framework can be useful in case of emoji prediction. This task is divide into two subtask, they are based on dataset presented in two different languages English and Spanish.
We present our submission to the Semeval 2018 task on emoji prediction. We used a random forest, with an ensemble of bag-of-words, sentiment and psycholinguistic features. Although we performed well on the trial dataset (attaining a macro f-score of 63.185 for English and 81.381 for Spanish), our approach did not perform as well on the test data. We describe our features and classi cation protocol, as well as initial experiments, concluding with a discussion of the discrepancy between our trial and test results.
We present the system built for SemEval-2018 Task 2 on Emoji Prediction. Although Twitter messages are very short we managed to design a wide variety of features: textual, semantic, sentiment, emotion-, and color-related ones. We investigated different methods of text preprocessing including replacing text emojis with respective tokens and splitting hashtags to capture more meaning. To represent text we used word n-grams and word embeddings. We experimented with a wide range of classifiers and our best results were achieved using a SVM-based classifier and a Hierarchical Attention Neural Network.
In this paper we present the system submitted to the SemEval2018 task2 : Multilingual Emoji Prediction. Our system approaches both languages as being equal by first; considering word embeddings associated to automatically computed features of different types, then by applying bagging algorithm RandomForest to predict the emoji of a tweet.
This paper describes the ALANIS system that participated in Task 3 of SemEval-2018. We develop a system for detection of irony, as well as the detection of three types of irony: verbal polar irony, other verbal irony, and situational irony. The system uses a logistic regression model in subtask A and a voted classifier system with manually developed features to identify ironic tweets. This model improves on a naive bayes baseline by about 8 percent on training set.
This paper describes the NeuroSent system that participated in SemEval 2018 Task 3. Our system takes a supervised approach that builds on neural networks and word embeddings. Word embeddings were built by starting from a repository of user generated reviews. Thus, they are specific for sentiment analysis tasks. Then, tweets are converted in the corresponding vector representation and given as input to the neural network with the aim of learning the different semantics contained in each emotion taken into account by the SemEval task. The output layer has been adapted based on the characteristics of each subtask. Preliminary results obtained on the provided training set are encouraging for pursuing the investigation into this direction.
This paper describes our system created for the SemEval-2018 Task 3: Irony detection in English tweets. Our strongly constrained system uses only the provided training data without any additional external resources. Our system is based on Maximum Entropy classifier and various features using parse tree, POS tags, and morphological features. Even without additional lexicons and word embeddings we achieved fourth place in Subtask A and seventh in Subtask B in terms of accuracy.
This paper describes our NIHRIO system for SemEval-2018 Task 3 “Irony detection in English tweets.” We propose to use a simple neural network architecture of Multilayer Perceptron with various types of input features including: lexical, syntactic, semantic and polarity features. Our system achieves very high performance in both subtasks of binary and multi-class irony detection in tweets. In particular, we rank at least fourth using the accuracy metric and sixth using the F1 metric. Our code is available at: https://github.com/NIHRIO/IronyDetectionInTwitter
In this paper we describe our participation in the SemEval-2018 task 3 Shared Task on Irony Detection. We have approached the task with our low dimensionality representation method (LDR), which exploits low dimensional features extracted from text on the basis of the occurrence probability of the words depending on each class. Our intuition is that words in ironic texts have different probability of occurrence than in non-ironic ones. Our approach obtained acceptable results in both subtasks A and B. We have performed an error analysis that shows the difference on correct and incorrect classified tweets.
In this paper we introduce our system for the task of Irony detection in English tweets, a part of SemEval 2018. We propose representation learning approach that relies on a multi-layered bidirectional LSTM, without using external features that provide additional semantic information. Although our model is able to outperform the baseline in the validation set, our results show limited generalization power over the test set. Given the limited size of the dataset, we think the usage of more pre-training schemes would greatly improve the obtained results.
The paper describes our search for a universal algorithm of detecting intentional lexical ambiguity in different forms of creative language. At SemEval-2018 Task 3, we used PunFields, the system of automatic analysis of English puns that we introduced at SemEval-2017, to detect irony in tweets. Preliminary tests showed that it can reach the score of F1=0.596. However, at the competition, its result was F1=0.549.
This paper proposes a novel feature extraction process for SemEval task 3: Irony detection in English tweets. The proposed system incorporates a concatenative featurization of tweet and hashtags, which helps distinguishing between the irony-related and the other components. The system embeds tweets into a vector sequence with widely used pretrained word vectors, partially using a character embedding for the words that are out of vocabulary. Identification was performed with BiLSTM and CNN classifiers, achieving F1 score of 0.5939 (23/42) and 0.3925 (10/28) each for the binary and the multi-class case, respectively. The reliability of the proposed scheme was verified by analyzing the Gold test data, which demonstrates how hashtags can be taken into account when identifying various types of irony.
This paper describes the systems submitted to SemEval 2018 Task 3 “Irony detection in English tweets” for both subtasks A and B. The first system leveraging a combination of sentiment, distributional semantic, and text surface features is ranked third among 44 teams according to the official leaderboard of the subtask A. The second system with slightly different representation of the features ranked ninth in subtask B. We present a method that entails decomposing tweets into separate parts. Searching for contrast within the constituents of a tweet is an integral part of our system. We embrace an extensive definition of contrast which leads to a vast coverage in detecting ironic content.
We propose a Long Short Term Memory Neural Network model for irony detection in tweets in this paper. Our model is trained using word embeddings and emoji embeddings. We show that adding sentiment scores to our model improves the F1 score of our baseline LSTM by approximately .012, and therefore show that high-level features can be used to improve word embeddings in certain Natural Language Processing applications. Our model ranks 24/43 for binary classification and 5/31 for multiclass classification. We make our model easily accessible to the research community.
This paper describes the participation of ELiRF-UPV team at tasks 1 and 3 of Semeval-2018. We present a deep learning based system that assembles Convolutional Neural Networks and Long Short-Term Memory neural networks. This system has been used with slight modifications for the two tasks addressed both for English and Spanish. Finally, the results obtained in the competition are reported and discussed.
This paper describes our system, entitled IronyMagnet, for the 3rd Task of the SemEval 2018 workshop, “Irony Detection in English Tweets”. In Task 1, irony classification task has been considered as a binary classification task. Now for the first time, finer categories of irony are considered as part of a shared task. In task 2, three types of irony are considered; “Irony by contrast” - ironic instances where evaluative expression portrays inverse polarity (positive, negative) of the literal proposition; “Situational irony” - ironic instances where output of a situation do not comply with its expectation; “Other verbal irony” - instances where ironic intent does not rely on polarity contrast or unexpected outcome. We proposed a Siamese neural network for irony detection, which is consisted of two subnetworks, each containing a long short term memory layer(LSTM) and an embedding layer initialized with vectors from Glove word embedding 1 . The system achieved a f-score of 0.72, and 0.50 in task 1, and task 2 respectively.
The objective of this paper is to provide a description for a system built as our participation in SemEval-2018 Task 3 on Irony detection in English tweets. This system classifies a tweet as either ironic or non-ironic through a supervised learning approach. Our approach is to implement three feature models, and to then improve the performance of the supervised learning classification of tweets by combining many data features and using a voting system on four different classifiers. We describe the process of pre-processing data, extracting features, and running different types of classifiers against our feature set. In the competition, our system achieved an F1-score of 0.4675, ranking 35th in subtask A, and an F1-score score of 0.3014 ranking 22th in subtask B.
This paper describes the Irony detection system that participates in SemEval-2018 Task 3: Irony detection in English tweets. The system participated in the subtasks A and B. This paper discusses the results of our system in the development, evaluation and post evaluation. Each class in the dataset is represented as directed unweighted graphs. Then, the comparison is carried out with each class graph which results in a vector. This vector is used as features by machine learning algorithm. The model is evaluated on a hold on strategy. The organizers randomly split 80% (3,833 instances) training set (provided to the participant in training their system) and testing set 20%(958 instances). The test set is reserved to evaluate the performance of participants systems. During the evaluation, our system ranked 23 in the Coda Lab result of the subtask A (binary class problem). The binary class system achieves accuracy 0.6135, precision 0.5091, recall 0.7170 and F measure 0.5955. The subtask B (multi-class problem) system is ranked 22 in Coda Lab results. The multiclass model achieves the accuracy 0.4158, precision 0.4055, recall 0.3526 and f measure 0.3101.
This paper describes the system we submitted to SemEval-2018 Task 3. The aim of the system is to distinguish between irony and non-irony in English tweets. We create a targeted feature set and analyse how different features are useful in the task of irony detection, achieving an F1-score of 0.5914. The analysis of individual features provides insight that may be useful in future attempts at detecting irony in tweets.
This paper describes an ensemble approach to the SemEval-2018 Task 3. The proposed method is composed of two renowned methods in text classification together with a novel approach for capturing ironic content by exploiting a tailored lexicon for irony detection. We experimented with different ensemble settings. The obtained results show that our method has a good performance for detecting the presence of ironic content in Twitter.
The paper describes our submissions to task 3 in SemEval-2018. There are two subtasks: Subtask A is a binary classification task to determine whether a tweet is ironic, and Subtask B is a fine-grained classification task including four classes. To address them, we explored supervised machine learning method alone and in combination with neural networks.
This paper describes the KLUEnicorn system submitted to the SemEval-2018 task on “Irony detection in English tweets”. The proposed system uses a naive Bayes classifier to exploit rather simple lexical, pragmatical and semantical features as well as sentiment. It further takes a closer look at different adverb categories and named entities and factors in word-embedding information.
In this paper we present two deep-learning systems that competed at SemEval-2018 Task 3 “Irony detection in English tweets”. We design and ensemble two independent models, based on recurrent neural networks (Bi-LSTM), which operate at the word and character level, in order to capture both the semantic and syntactic information in tweets. Our models are augmented with a self-attention mechanism, in order to identify the most informative words. The embedding layer of our word-level model is initialized with word2vec word embeddings, pretrained on a collection of 550 million English tweets. We did not utilize any handcrafted features, lexicons or external datasets as prior information and our models are trained end-to-end using back propagation on constrained data. Furthermore, we provide visualizations of tweets with annotations for the salient tokens of the attention layer that can help to interpret the inner workings of the proposed models. We ranked 2nd out of 42 teams in Subtask A and 2nd out of 31 teams in Subtask B. However, post-task-completion enhancements of our models achieve state-of-the-art results ranking 1st for both subtasks.
This paper describe the system we proposed to participate the first year of Irony detection in English tweets competition. Previous works demonstrate that LSTMs models have achieved remarkable performance in natural language processing; besides, combining multiple classification from various individual classifiers in general is more powerful than a single classification. In order to obtain more precision classification of irony detection, our system trained several individual neural network classifiers and combined their results according to the ensemble-learning algorithm.
In this paper, we describe the system submitted for the SemEval 2018 Task 3 (Irony detection in English tweets) Subtask A by the team Binarizer. Irony detection is a key task for many natural language processing works. Our method treats ironical tweets to consist of smaller parts containing different emotions. We break down tweets into separate phrases using a dependency parser. We then embed those phrases using an LSTM-based neural network model which is pre-trained to predict emoticons for tweets. Finally, we train a fully-connected network to achieve classification.
Sentiment analysis plays an important role in E-commerce. Identifying ironic and sarcastic content in text plays a vital role in inferring the actual intention of the user, and is necessary to increase the accuracy of sentiment analysis. This paper describes the work on identifying the irony level in twitter texts. The system developed by the SSN MLRG1 team in SemEval-2018 for task 3 (irony detection) uses rule based approach for feature selection and MultiLayer Perceptron (MLP) technique to build the model for multiclass irony classification subtask, which classifies the given text into one of the four class labels.
This paper describes our participation in SemEval 2018 Task 3 on Irony Detection in Tweets. We combine linguistic features with pre-trained activations of a neural network. The CNN is trained on the emoji prediction task. We combine the two feature sets and feed them into an XGBoost Classifier for classification. Subtask-A involves classification of tweets into ironic and non-ironic instances whereas Subtask-B involves classification of the tweet into - non-ironic, verbal irony, situational irony or other verbal irony. It is observed that combining features from these two different feature spaces improves our system results. We leverage the SMOTE algorithm to handle the problem of class imbalance in Subtask-B. Our final model achieves an F1-score of 0.65 and 0.47 on Subtask-A and Subtask-B respectively. Our system ranks 4th on both tasks respectively, outperforming the baseline by 6% on Subtask-A and 14% on Subtask-B.
In this paper we describe the system used by the ValenTO team in the shared task on Irony Detection in English Tweets at SemEval 2018. The system takes as starting point emotIDM, an irony detection model that explores the use of affective features based on a wide range of lexical resources available for English, reflecting different facets of affect. We experimented with different settings, by exploiting different classifiers and features, and participated both to the binary irony detection task and to the task devoted to distinguish among different types of irony. We report on the results obtained by our system both in a constrained setting and unconstrained setting, where we explored the impact of using additional data in the training phase, such as corpora annotated for the presence of irony or sarcasm from the state of the art. Overall, the performance of our system seems to validate the important role that affective information has for identifying ironic content in Twitter.
This paper describes the participation of the #NonDicevoSulSerio team at SemEval2018-Task3, which focused on Irony Detection in English Tweets and was articulated in two tasks addressing the identification of irony at different levels of granularity. We participated in both tasks proposed: Task A is a classical binary classification task to determine whether a tweet is ironic or not, while Task B is a multiclass classification task devoted to distinguish different types of irony, where systems have to predict one out of four labels describing verbal irony by clash, other verbal irony, situational irony, and non-irony. We addressed both tasks by proposing a model built upon a well-engineered features set involving both syntactic and lexical features, and a wide range of affective-based features, covering different facets of sentiment and emotions. The use of new features for taking advantage of the affective information conveyed by emojis has been analyzed. On this line, we also tried to exploit the possible incongruity between sentiment expressed in the text and in the emojis included in a tweet. We used a Support Vector Machine classifier, and obtained promising results. We also carried on experiments in an unconstrained setting.
Character identification is an entity-linking task that finds words referring to the same person among the nouns mentioned in a conversation and turns them into one entity. In this paper, we define a sequence-labeling problem to solve character identification, and propose an attention-based recurrent neural network (RNN) encoder–decoder model. The in-put document for character identification on multiparty dialogues consists of several conversations, which increase the length of the input sequence. The RNN encoder–decoder model suffers from poor performance when the length of the input sequence is long. To solve this problem, we propose applying position encoding and the self-matching network to the RNN encoder–decoder model. Our experimental results demonstrate that of the four models proposed, Model 2 showed an F1 score of 86.00% and a label accuracy of 85.10% at the scene-level.
In this paper, we describe the participation of the NewsReader system in the SemEval-2018 Task 5 on Counting Events and Participants in the Long Tail. NewsReader is a generic unsupervised text processing system that detects events with participants, time and place to generate Event Centric Knowledge Graphs (ECKGs). We minimally adapted these ECKGs to establish a baseline performance for the task. We first use the ECKGs to establish which documents report on the same incident and what event mentions are coreferential. Next, we aggregate ECKGs across coreferential mentions and use the aggregated knowledge to answer the questions of the task. Our participation tests the quality of NewsReader to create ECKGs, as well as the potential of ECKGs to establish event identity and reason over the result to answer the task queries.
We present the approach developed at the Faculty of Engineering of the University of Porto to participate in SemEval-2018 Task 5: Counting Events and Participants within Highly Ambiguous Data covering a very long tail.The work described here presents the experimental system developed to extract entities from news articles for the sake of Question Answering. We propose a supervised learning approach to enable the recognition of two different types of entities: Locations and Participants. We also discuss the use of distance-based algorithms (using Levenshtein distance and Q-grams) for the detection of documents’ closeness based on the entities extracted. For the experiments, we also used a multi-agent system that improved the performance.
In this paper, we describe Alibaba’s participating system in the semEval-2018 Task5: Counting Events and Participants in the Long Tail. We designed and implemented a pipeline system that consists of components to extract question properties and document features, document event category classifications, document retrieval and document clustering. To retrieve the majority of the relevant documents, we carefully designed our system to extract key information from each question and document pair. After retrieval, we perform further document clustering to count the number of events. The task contains 3 subtasks, on which we achieved F1 score of 78.33, 50.52, 63.59 , respectively, for document level retrieval. Our system ranks first in all the three subtasks on document level retrieval, and it also ranks first in incident-level evaluation by RSME measure in subtask 3.
This paper describes the first task on semantic relation extraction and classification in scientific paper abstracts at SemEval 2018. The challenge focuses on domain-specific semantic relations and includes three different subtasks. The subtasks were designed so as to compare and quantify the effect of different pre-processing steps on the relation classification results. We expect the task to be relevant for a broad range of researchers working on extracting specialized knowledge from domain corpora, for example but not limited to scientific or bio-medical information extraction. The task attracted a total of 32 participants, with 158 submissions across different scenarios.
Reliably detecting relevant relations between entities in unstructured text is a valuable resource for knowledge extraction, which is why it has awaken significant interest in the field of Natural Language Processing. In this paper, we present a system for relation classification and extraction based on an ensemble of convolutional and recurrent neural networks that ranked first in 3 out of the 4 Subtasks at SemEval 2018 Task 7. We provide detailed explanations and grounds for the design choices behind the most relevant features and analyze their importance.
This paper describes the SemEval 2018 shared task on semantic extraction from cybersecurity reports, which is introduced for the first time as a shared task on SemEval. This task comprises four SubTasks done incrementally to predict the characteristics of a specific malware using cybersecurity reports. To the best of our knowledge, we introduce the world’s largest publicly available dataset of annotated malware reports in this task. This task received in total 18 submissions from 9 participating teams.
This paper describes our submissions for SemEval-2018 Task 8: Semantic Extraction from CybersecUrity REports using NLP. The DM_NLP participated in two subtasks: SubTask 1 classifies if a sentence is useful for inferring malware actions and capabilities, and SubTask 2 predicts token labels (“Action”, “Entity”, “Modifier” and “Others”) for a given malware-related sentence. Since we leverage results of Subtask 2 directly to infer the result of Subtask 1, the paper focus on the system solving Subtask 2. By taking Subtask 2 as a sequence labeling task, our system relies on a recurrent neural network named BiLSTM-CNN-CRF with rich linguistic features, such as POS tags, dependency parsing labels, chunking labels, NER labels, Brown clustering. Our system achieved the highest F1 score in both token level and phrase level.
This paper describes the SemEval 2018 Shared Task on Hypernym Discovery. We put forward this task as a complementary benchmark for modeling hypernymy, a problem which has traditionally been cast as a binary classification task, taking a pair of candidate words as input. Instead, our reformulated task is defined as follows: given an input term, retrieve (or discover) its suitable hypernyms from a target corpus. We proposed five different subtasks covering three languages (English, Spanish, and Italian), and two specific domains of knowledge in English (Medical and Music). Participants were allowed to compete in any or all of the subtasks. Overall, a total of 11 teams participated, with a total of 39 different systems submitted through all subtasks. Data, results and further information about the task can be found at https://competitions.codalab.org/competitions/17119.
This report describes the system developed by the CRIM team for the hypernym discovery task at SemEval 2018. This system exploits a combination of supervised projection learning and unsupervised pattern-based hypernym discovery. It was ranked first on the 3 sub-tasks for which we submitted results.
This paper describes the SemEval 2018 Task 10 on Capturing Discriminative Attributes. Participants were asked to identify whether an attribute could help discriminate between two concepts. For example, a successful system should determine that ‘urine’ is a discriminating feature in the word pair ‘kidney’, ‘bone’. The aim of the task is to better evaluate the capabilities of state of the art semantic models, beyond pure semantic similarity. The task attracted submissions from 21 teams, and the best system achieved a 0.75 F1 score.
We present SUNNYNLP, our system for solving SemEval 2018 Task 10: “Capturing Discriminative Attributes”. Our Support-Vector-Machine(SVM)-based system combines features extracted from pre-trained embeddings and statistical information from Is-A taxonomy to detect semantic difference of concepts pairs. Our system is demonstrated to be effective in detecting semantic difference and is ranked 1st in the competition in terms of F1 measure. The open source of our code is coined SUNNYNLP.
This report summarizes the results of the SemEval 2018 task on machine comprehension using commonsense knowledge. For this machine comprehension task, we created a new corpus, MCScript. It contains a high number of questions that require commonsense knowledge for finding the correct answer. 11 teams from 4 different countries participated in this shared task, most of them used neural approaches. The best performing system achieves an accuracy of 83.95%, outperforming the baselines by a large margin, but still far from the human upper bound, which was found to be at 98%.
This paper describes our system for SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge. We use Three-way Attentive Networks (TriAN) to model interactions between the passage, question and answers. To incorporate commonsense knowledge, we augment the input with relation embedding from the graph of general knowledge ConceptNet. As a result, our system achieves state-of-the-art performance with 83.95% accuracy on the official test data. Code is publicly available at https://github.com/intfloat/commonsense-rc.
A natural language argument is composed of a claim as well as reasons given as premises for the claim. The warrant explaining the reasoning is usually left implicit, as it is clear from the context and common sense. This makes a comprehension of arguments easy for humans but hard for machines. This paper summarizes the first shared task on argument reasoning comprehension. Given a premise and a claim along with some topic information, the goal was to automatically identify the correct warrant among two candidates that are plausible and lexically close, but in fact imply opposite claims. We describe the dataset with 1970 instances that we built for the task, and we outline the 21 computational approaches that participated, most of which used neural networks. The results reveal the complexity of the task, with many approaches hardly improving over the random accuracy of about 0.5. Still, the best observed accuracy (0.712) underlines the principle feasibility of identifying warrants. Our analysis indicates that an inclusion of external knowledge is key to reasoning comprehension.
This paper describes our GIST team system that participated in SemEval-2018 Argument Reasoning Comprehension task (Task 12). Here, we address two challenging factors: unstated common senses and two lexically close warrants that lead to contradicting claims. A key idea for our system is full use of transfer learning from the Natural Language Inference (NLI) task to this task. We used Enhanced Sequential Inference Model (ESIM) to learn the NLI dataset. We describe how to use ESIM for transfer learning to choose correct warrant through a proposed system. We show comparable results through ablation experiments. Our system ranked 1st among 22 systems, outperforming all the systems more than 10%.
We present LightRel, a lightweight and fast relation classifier. Our goal is to develop a high baseline for different relation extraction tasks. By defining only very few data-internal, word-level features and external knowledge sources in the form of word clusters and word embeddings, we train a fast and simple linear classifier
We describe our system for SemEval-2018 Shared Task on Semantic Relation Extraction and Classification in Scientific Papers where we focus on the Classification task. Our simple piecewise convolution neural encoder performs decently in an end to end manner. A simple inter-task data augmentation significantly boosts the performance of the model. Our best-performing systems stood 8th out of 20 teams on the classification task on noisy data and 12th out of 28 teams on the classification task on clean data.
This paper describes our submission for SemEval 2018 Task 7 shared task on semantic relation extraction and classification in scientific papers. Our model is based on the end-to-end relation extraction model of (Miwa and Bansal, 2016) with several enhancements such as character-level encoding attention mechanism on selecting pretrained concept candidate embeddings. Our official submission ranked the second in relation classification task (Subtask 1.1 and Subtask 2 Senerio 2), and the first in the relation extraction task (Subtask 2 Scenario 1).
This paper reports our participation for SemEval-2018 Task 7 on extraction and classification of relationships between entities in scientific papers. Our approach is based on the use of a Convolutional Neural Network (CNN) trained on350 abstract with manually annotated entities and relations. Our hypothesis is that this deep learning model can be applied to extract and classify relations between entities for scientific papers at the same time. We use the Part-of-Speech and the distances to the target entities as part of the embedding for each word and we blind all the entities by marker names. In addition, we use sampling techniques to overcome the imbalance issues of this dataset. Our architecture obtained an F1-score of 35.4% for the relation extraction task and 18.5% for the relation classification task with a basic configuration of the one step CNN.
SemEval 2018 Task 7 tasked participants to build a system to classify two entities within a sentence into one of the 6 possible relation types. We tested 3 classes of models: Linear classifiers, Long Short-Term Memory (LSTM) models, and Convolutional Neural Network (CNN) models. Ultimately, the CNN model class proved most performant, so we specialized to this model for our final submissions. We improved performance beyond a vanilla CNN by including a variant of negative sampling, using custom word embeddings learned over a corpus of ACL articles, training over corpora of both tasks 1.1 and 1.2, using reversed feature, using part of context words beyond the entity pairs and using ensemble methods to improve our final predictions. We also tested attention based pooling, up-sampling, and data augmentation, but none improved performance. Our model achieved rank 6 out of 28 (macro-averaged F1-score: 72.7) in subtask 1.1, and rank 4 out of 20 (macro F1: 80.6) in subtask 1.2.
This article presents the SIRIUS-LTG-UiO system for the SemEval 2018 Task 7 on Semantic Relation Extraction and Classification in Scientific Papers. First we extract the shortest dependency path (sdp) between two entities, then we introduce a convolutional neural network (CNN) which takes the shortest dependency path embeddings as input and performs relation classification with differing objectives for each subtask of the shared task. This approach achieved overall F1 scores of 76.7 and 83.2 for relation classification on clean and noisy data, respectively. Furthermore, for combined relation extraction and classification on clean data, it obtained F1 scores of 37.4 and 33.6 for each phase. Our system ranks 3rd in all three sub-tasks of the shared task.
This paper presents our participation for sub-task1 (1.1 and 1.2) in SemEval 2018 task 7: Semantic Relation Extraction and Classification in Scientific Papers (Gábor et al., 2018). We experimented on this task with two methods: CNN method and traditional pipeline method. We use the context between two entities (included) as input information for both methods, which extremely reduce the noise effect. For the CNN method, we construct a simple convolution neural network to automatically learn features from raw texts without any manual processing. Moreover, we use the softmax function to classify the entity pair into a specific relation category. For the traditional pipeline method, we use the Hackabout method as a representation which is described in section3.5. The CNN method’s result is much better than traditional pipeline method (49.1% vs. 42.3% and 71.1% vs. 54.6% ).
Automatic extraction of semantic relations from text can support finding relevant information from scientific publications. We describe our participation in Task 7 of SemEval-2018 for which we experimented with two relations extraction tools - jSRE and TEES - for the extraction and classification of six relation types. The results we obtained with TEES were significantly superior than those with jSRE (33.4% vs. 30.09% and 20.3% vs. 16%). Additionally, we utilized the model trained with TEES for extracting semantic relations from biomedical abstracts, for which we present a preliminary evaluation.
In this work we evaluate applicability of entity pair models and neural network architectures for relation extraction and classification in scientific papers at SemEval-2018. We carry out experiments with representing entity pairs through sentence tokens and through shortest path in dependency tree, comparing approaches based on convolutional and recurrent neural networks. With convolutional network applied to shortest path in dependency tree we managed to be ranked eighth in subtask 1.1 (“clean data”), ninth in 1.2 (“noisy data”). Similar model applied to separate parts of the shortest path was mounted to ninth (extraction track) and seventh (classification track) positions in subtask 2 ranking.
Large repositories of scientific literature call for the development of robust methods to extract information from scholarly papers. This problem is addressed by the SemEval 2018 Task 7 on extracting and classifying relations found within scientific publications. In this paper, we present a feature-based and a deep learning-based approach to the task and discuss the results of the system runs that we submitted for evaluation.
SemEval 2018 Task 7 focuses on relation extraction and classification in scientific literature. In this work, we present our tree-based LSTM network for this shared task. Our approach placed 9th (of 28) for subtask 1.1 (relation classification), and 5th (of 20) for subtask 1.2 (relation classification with noisy entities). We also provide an ablation study of features included as input to the network.
In this paper we describe our system for SemEval-2018 Task 7 on classification of semantic relations in scientific literature for clean (subtask 1.1) and noisy data (subtask 1.2). We compare two models for classification, a C-LSTM which utilizes only word embeddings and an SVM that also takes handcrafted features into account. To adapt to the domain of science we train word embeddings on scientific papers collected from arXiv.org. The hand-crafted features consist of lexical features to model the semantic relations as well as the entities between which the relation holds. Classification of Relations using Embeddings (ClaiRE) achieved an F1 score of 74.89% for the first subtask and 78.39% for the second.
We describe two systems for semantic relation classification with which we participated in the SemEval 2018 Task 7, subtask 1 on semantic relation classification: an SVM model and a CNN model. Both models combine dense pretrained word2vec features and hancrafted sparse features. For training the models, we combine the two datasets provided for the subtasks in order to balance the under-represented classes. The SVM model performed better than CNN, achieving a F1-macro score of 69.98% on subtask 1.1 and 75.69% on subtask 1.2. The system ranked 7th on among 28 submissions on subtask 1.1 and 7th among 20 submissions on subtask 1.2.
Discovering semantic relations within textual documents is a timely topic worthy of investigation. Natural language processing strategies are generally used for linking chunks of text in order to extract information that can be exploited by semantic search engines for performing complex queries. The scientific domain is an interesting area where these techniques can be applied. In this paper, we describe a system based on neural networks applied to the SemEval 2018 Task 7. The system relies on the use of word embeddings for composing the vectorial representation of text chunks. Such representations are used for feeding a neural network aims to learn the structure of paths connecting chunks associated with a specific relation. Preliminary results demonstrated the suitability of the proposed approach encouraging the investigation of this research direction.
This paper describes our system, SciREL (Scientific abstract RELation extraction system), developed for the SemEval 2018 Task 7: Semantic Relation Extraction and Classification in Scientific Papers. We present a feature-vector based system to extract explicit semantic relation and classify them. Our system is trained in the ACL corpus (BIrd et al., 2008) that contains annotated abstracts given by the task organizers. When an abstract with annotated entities is given as the input into our system, it extracts the semantic relations through a set of defined features and classifies them into one of the given six categories of relations through feature engineering and a learned model. For the best combination of features, our system SciREL obtained an F-measure of 20.03 on the official test corpus which includes 150 abstracts in the relation classification Subtask 1.1. In this paper, we provide an in-depth error analysis of our results to prevent duplication of research efforts in the development of future systems
The paper presents NTNU’s contribution to SemEval-2018 Task 7 on relation identification and classification. The class weights and parameters of five alternative supervised classifiers were optimized through grid search and cross-validation. The outputs of the classifiers were combined through voting for the final prediction. A wide variety of features were explored, with the most informative identified by feature selection. The best setting achieved F1 scores of 47.4% and 66.0% in the relation classification subtasks 1.1 and 1.2. For relation identification and classification in subtask 2, it achieved F1 scores of 33.9% and 17.0%,
This paper describes our approach to SemEval-2018 Task 7 – given an entity-tagged text from the ACL Anthology corpus, identify and classify pairs of entities that have one of six possible semantic relationships. Our model consists of a convolutional neural network leveraging pre-trained word embeddings, unlabeled ACL-abstracts, and multiple window sizes to automatically learn useful features from entity-tagged sentences. We also experiment with a hybrid loss function, a combination of cross-entropy loss and ranking loss, to boost the separation in classification scores. Lastly, we include WordNet-based features to further improve the performance of our model. Our best model achieves an F1(macro) score of 74.2 and 84.8 on subtasks 1.1 and 1.2, respectively.
In this work we present our participation to SemEval-2018 Task 8 subtasks 1 & 2 respectively. We developed Convolution Neural Network system for malware sentence classification (subtask 1) and Conditional Random Fields system for malware token label prediction (subtask 2). We experimented with couple of word embedding strategies, feature sets and achieved competitive performance across the two subtasks. For subtask 1 We experimented with two category of word embeddings namely native embeddings and task specific embedding using Word2vec and Glove algorithms. 1. Native Embeddings: All words including the unknown ones that are randomly initialized use embeddings from original Word2vec/Glove models. 2. Task specific : The embeddings are generated by training Word2vec/Glove algorithms on sentences from MalwareTextDB We found that glove outperforms rest of embeddings for subtask 1. For subtask 2, we used N-grams of size 6, previous, next tokens and labels, features giving disjunctions of words anywhere in the left or right, word shape features, word lemma of current, previous and next words, word-tag pair features, POS tags, prefix and suffixes.
This paper describes HCCL team systems that participated in SemEval 2018 Task 8: SecureNLP (Semantic Extraction from cybersecurity reports using NLP). To solve the problem, our team applied a neural network architecture that benefits from both word and character level representaions automatically, by using combination of Bi-directional LSTM, CNN and CRF (Ma and Hovy, 2016). Our system is truly end-to-end, requiring no feature engineering or data preprocessing, and we ranked 4th in the subtask 1, 7th in the subtask2 and 3rd in the SubTask2-relaxed.
We describe the systems developed by the UMBC team for 2018 SemEval Task 8, SecureNLP (Semantic Extraction from CybersecUrity REports using Natural Language Processing). We participated in three of the sub-tasks: (1) classifying sentences as being relevant or irrelevant to malware, (2) predicting token labels for sentences, and (4) predicting attribute labels from the Malware Attribute Enumeration and Characterization vocabulary for defining malware characteristics. We achieve F1 score of 50.34/18.0 (dev/test), 22.23 (test-data), and 31.98 (test-data) for Task1, Task2 and Task2 respectively. We also make our cybersecurity embeddings publicly available at http://bit.ly/cyber2vec.
In this paper, we describe our proposal for the task of Semantic Extraction from Cybersecurity Reports. The goal is to explore if natural language processing methods can provide relevant and actionable knowledge to contribute to better understand malicious behavior. Our method consists of an attention-based Bi-LSTM which achieved competitive performance of 0.57 for the Subtask 1. In the due process we also present ablation studies across multiple embeddings and their level of representation and also report the strategies we used to mitigate the extreme imbalance between classes.
Cybersecurity risks such as malware threaten the personal safety of users, but to identify malware text is a major challenge. The paper proposes a supervised learning approach to identifying malware sentences given a document (subTask1 of SemEval 2018, Task 8), as well as to classifying malware tokens in the sentences (subTask2). The approach achieved good results, ranking second of twelve participants for both subtasks, with F-scores of 57% for subTask1 and 28% for subTask2.
The four sub-tasks of SecureNLP build towards a capability for quickly highlighting critical information from malware reports, such as the specific actions taken by a malware sample. Digital Operatives (DO) submitted to sub-tasks 1 and 2, using standard text analysis technology (text classification for sub-task 1, and a CRF for sub-task 2). Performance is broadly competitive with other submitted systems on sub-task 1 and weak on sub-task 2. The annotation guidelines for the intermediate sub-tasks create a linkage to the final task, which is both an annotation challenge and a potentially useful feature of the task. The methods that DO chose do not attempt to make use of this linkage, which may be a missed opportunity. This motivates a post-hoc error analysis. It appears that the annotation task is very hard, and that in some cases both deep conceptual knowledge and substantial surrounding context are needed in order to correctly classify sentences.
This paper presents the participation of Apollo’s team in the SemEval-2018 Task 9 “Hypernym Discovery”, Subtask 1: “General-Purpose Hypernym Discovery”, which tries to produce a ranked list of hypernyms for a specific term. We propose a novel approach for automatic extraction of hypernymy relations from a corpus by using dependency patterns. We estimated that the application of these patterns leads to a higher score than using the traditional lexical patterns.
This paper describes a hypernym discovery system for our participation in the SemEval-2018 Task 9, which aims to discover the best (set of) candidate hypernyms for input concepts or entities, given the search space of a pre-defined vocabulary. We introduce a neural network architecture for the concerned task and empirically study various neural network models to build the representations in latent space for words and phrases. The evaluated models include convolutional neural network, long-short term memory network, gated recurrent unit and recurrent convolutional neural network. We also explore different embedding methods, including word embedding and sense embedding for better performance.
Hypernym discovery aims to discover the hypernym word sets given a hyponym word and proper corpus. This paper proposes a simple but effective method for the discovery of hypernym sets based on word embedding, which can be used to measure the contextual similarities between words. Given a test hyponym word, we get its hypernym lists by computing the similarities between the hyponym word and words in the training data, and fill the test word’s hypernym lists with the hypernym list in the training set of the nearest similarity distance to the test word. In SemEval 2018 task9, our results, achieve 1st on Spanish, 2nd on Italian, 6th on English in the metric of MAP.
Hypernym Discovery is the task of identifying potential hypernyms for a given term. A hypernym is a more generalized word that is super-ordinate to more specific words. This paper explores several approaches that rely on co-occurrence frequencies of word pairs, Hearst Patterns based on regular expressions, and word embeddings created from the UMBC corpus. Our system Babbage participated in Subtask 1A for English and placed 6th of 19 systems when identifying concept hypernyms, and 12th of 18 systems for entity hypernyms.
In this paper, we present our proposed system (EXPR) to participate in the hypernym discovery task of SemEval 2018. The task addresses the challenge of discovering hypernym relations from a text corpus. Our proposal is a combined approach of path-based technique and distributional technique. We use dependency parser on a corpus to extract candidate hypernyms and represent their dependency paths as a feature vector. The feature vector is concatenated with a feature vector obtained using Wikipedia pre-trained term embedding model. The concatenated feature vector fits a supervised machine learning method to learn a classifier model. This model is able to classify new candidate hypernyms as hypernym or not. Our system performs well to discover new hypernyms not defined in gold hypernyms.
This paper describes a simple but competitive unsupervised system for hypernym discovery. The system uses skip-gram word embeddings with negative sampling, trained on specialised corpora. Candidate hypernyms for an input word are predicted based based on cosine similarity scores. Two sets of word embedding models were trained separately on two specialised corpora: a medical corpus and a music industry corpus. Our system scored highest in the medical domain among the competing unsupervised systems but performed poorly on the music industry domain. Our system does not depend on any external data other than raw specialised corpora.
This paper describes 300-sparsians’s participation in SemEval-2018 Task 9: Hypernym Discovery, with a system based on sparse coding and a formal concept hierarchy obtained from word embeddings. Our system took first place in subtasks (1B) Italian (all and entities), (1C) Spanish entities, and (2B) music entities.
We present our UWB system for the task of capturing discriminative attributes at SemEval 2018. Given two words and an attribute, the system decides, whether this attribute is discriminative between the words or not. Assuming Distributional Hypothesis, i.e., a word meaning is related to the distribution across contexts, we introduce several approaches to compare word contextual information. We experiment with state-of-the-art semantic spaces and with simple co-occurrence statistics. We show the word distribution in the corpus has potential for detecting discriminative attributes. Our system achieves F1 score 72.1% and is ranked #4 among 26 submitted systems.
This paper presents the two systems submitted by the meaning space team in Task 10 of the SemEval competition 2018 entitled Capturing discriminative attributes. The systems consist of combinations of approaches exploiting explicitly encoded knowledge about concepts in WordNet and information encoded in distributional semantic vectors. Rather than aiming for high performance, we explore which kind of semantic knowledge is best captured by different methods. The results indicate that WordNet glosses on different levels of the hierarchy capture many attributes relevant for this task. In combination with exploiting word embedding similarities, this source of information yielded our best results. Our best performing system ranked 5th out of 13 final ranks. Our analysis yields insights into the different kinds of attributes represented by different sources of knowledge.
This paper describes our system submission to the SemEval 2018 Task 10 on Capturing Discriminative Attributes. Given two concepts and an attribute, the task is to determine whether the attribute is semantically related to one concept and not the other. In this work we assume that discriminative attributes can be detected by discovering the association (or lack of association) between a pair of words. The hypothesis we test in this contribution is whether the semantic difference between two pairs of concepts can be treated in terms of measuring the distance between words in a vector space, or can simply be obtained as a by-product of word co-occurrence counts.
This article describes the unsupervised strategy submitted by the CitiusNLP team to the SemEval 2018 Task 10, a task which consists of predict whether a word is a discriminative attribute between two other words. Our strategy relies on the correspondence between discriminative attributes and relevant contexts of a word. More precisely, the method uses transparent distributional models to extract salient contexts of words which are identified as discriminative attributes. The system performance reaches about 70% accuracy when it is applied on the development dataset, but its accuracy goes down (63%) on the official test dataset.
Existing semantic models are capable of identifying the semantic similarity of words. However, it’s hard for these models to discriminate between a word and another similar word. Thus, the aim of SemEval-2018 Task 10 is to predict whether a word is a discriminative attribute between two concepts. In this task, we apply a multilayer perceptron (MLP)-convolutional neural network (CNN) model to identify whether an attribute is discriminative. The CNNs are used to extract low-level features from the inputs. The MLP takes both the flatten CNN maps and inputs to predict the labels. The evaluation F-score of our system on the test set is 0.629 (ranked 15th), which indicates that our system still needs to be improved. However, the behaviours of our system in our experiments provide useful information, which can help to improve the collective understanding of this novel task.
Semantic difference detection attempts to capture whether a word is a discriminative attribute between two other words. For example, the discriminative feature red characterizes the first word from the (apple, banana) pair, but not the second. Modeling semantic difference is essential for language understanding systems, as it provides useful information for identifying particular aspects of word senses. This paper describes our system implementation (the ALB system of the NLP@Unibuc team) for the 10th task of the SemEval 2018 workshop, “Capturing Discriminative Attributes”. We propose a method for semantic difference detection that uses an SVM classifier with features based on co-occurrence counts and shallow semantic parsing, achieving 0.63 F1 score in the competition.
This paper describes the participation of ELiRF-UPV team at task 10, Capturing Discriminative Attributes, of SemEval-2018. Our best approach consists of using ConceptNet, Wikipedia and NumberBatch embeddings in order to stablish relationships between concepts and attributes. Furthermore, this system achieves competitive results in the official evaluation.
This paper describes the system submitted to SemEval 2018 shared task 10 ‘Capturing Dicriminative Attributes’. We use a combination of knowledge-based and co-occurrence features to capture the semantic difference between two words in relation to an attribute. We define scores based on association measures, ngram counts, word similarity, and ConceptNet relations. The system is ranked 4th (joint) on the official leaderboard of the task.
In this paper we report an unsupervised method aimed to identify whether an attribute is discriminative for two words (which are treated as concepts, in our particular case). To this end, we use geometrically inspired vector operations underlying unsupervised decision functions. These decision functions operate on state-of-the-art neural word embeddings of the attribute and the concepts. The main idea can be described as follows: if attribute q discriminates concept a from concept b, then q is excluded from the feature set shared by these two concepts: the intersection. That is, the membership q∈ (a∩ b) does not hold. As a,b,q are represented with neural word embeddings, we tested vector operations allowing us to measure membership, i.e. fuzzy set operations (t-norm, for fuzzy intersection, and t-conorm, for fuzzy union) and the similarity between q and the convex cone described by a and b.
Luminoso participated in the SemEval 2018 task on “Capturing Discriminative Attributes” with a system based on ConceptNet, an open knowledge graph focused on general knowledge. In this paper, we describe how we trained a linear classifier on a small number of semantically-informed features to achieve an F1 score of 0.7368 on the task, close to the task’s high score of 0.75.
This paper describes BomJi, a supervised system for capturing discriminative attributes in word pairs (e.g. yellow as discriminative for banana over watermelon). The system relies on an XGB classifier trained on carefully engineered graph-, pattern- and word embedding-based features. It participated in the SemEval-2018 Task 10 on Capturing Discriminative Attributes, achieving an F1 score of 0.73 and ranking 2nd out of 26 participant systems.
This paper presents a comparison of several approaches for capturing discriminative attributes and considers an impact of concatenation of several word embeddings of different nature on the classification performance. A similarity-based method is proposed and compared with classical machine learning approaches. It is shown that this method outperforms others on all the considered word vector models and there is a performance increase when concatenated datasets are used.
This paper describes the system we submitted to Task 10 (Capturing Discriminative Attributes) in SemEval 2018. Given a triple (word1, word2, attribute), this task is to predict whether it exemplifies a semantic difference or not. We design and investigate several word embedding features, PMI features and WordNet features together with supervised machine learning methods to address this task. Officially released results show that our system ranks above average.
The “Capturing Discriminative Attributes” sharedtask is the tenth task, conjoint with SemEval2018. The task is to predict if a word can capture distinguishing attributes of one word from another. We use GloVe word embedding, pre-trained on openly sourced corpus for this task. A base representation is initially established over varied dimensions. These representations are evaluated based on validation scores over two models, first on an SVM based classifier and second on a one dimension CNN model. The scores are used to further develop the representation with vector combinations, by considering various distance measures. These measures correspond to offset vectors which are concatenated as features, mainly to improve upon the F1score, with the best accuracy. The features are then further tuned on the validation scores, to achieve highest F1score. Our evaluation narrowed down to two representations, classified on CNN models, having a total dimension length of 1204 & 1203 for the final submissions. Of the two, the latter feature representation delivered our best F1score of 0.658024 (as per result).
We participated to the SemEval-2018 shared task on capturing discriminative attributes (Task 10) with a simple system that ranked 8th amongst the 26 teams that took part in the evaluation. Our final score was 0.67, which is competitive with the winning score of 0.75, particularly given that our system is a zero-shot system that requires no training and minimal parameter optimisation. In addition to describing the submitted system, and discussing the implications of the relative success of such a system on this task, we also report on other, more complex models we experimented with.
In this paper we present three unsupervised models for capturing discriminative attributes based on information from word embeddings, WordNet, and sentence-level word co-occurrence frequency. We show that, of these approaches, the simple approach based on word co-occurrence performs best. We further consider supervised and unsupervised approaches to combining information from these models, but these approaches do not improve on the word co-occurrence model.
This paper describes the system that we submitted for SemEval-2018 task 10: capturing discriminative attributes. Our system is built upon a simple idea of measuring the attribute word’s similarity with each of the two semantically similar words, based on an extended word embedding method and WordNet. Instead of computing the similarities between the attribute and semantically similar words by using standard word embeddings, we propose a novel method that combines word and context embeddings which can better measure similarities. Our model is simple and effective, which achieves an average F1 score of 0.62 on the test set.
We describe the University of Maryland’s submission to SemEval-018 Task 10, “Capturing Discriminative Attributes”: given word triples (w1, w2, d), the goal is to determine whether d is a discriminating attribute belonging to w1 but not w2. Our study aims to determine whether word embeddings can address this challenging task. Our submission casts this problem as supervised binary classification using only word embedding features. Using a gaussian SVM model trained only on validation data results in an F-score of 60%. We also show that cosine similarity features are more effective, both in unsupervised systems (F-score of 65%) and supervised systems (F-score of 67%).
This paper presents the NTU NLP Lab system for the SemEval-2018 Capturing Discriminative Attributes task. Word embeddings, pointwise mutual information (PMI), ConceptNet edges and shortest path lengths are utilized as input features to build binary classifiers to tell whether an attribute is discriminative for a pair of concepts. Our neural network model reaches about 73% F1 score on the test set and ranks the 3rd in the task. Though the attributes to deal with in this task are all visual, our models are not provided with any image data. The results indicate that visual information can be derived from textual data.
This paper describes the participation of ELiRF-UPV team at task 11, Machine Comprehension using Commonsense Knowledge, of SemEval-2018. Our approach is based on the use of word embeddings, NumberBatch Embeddings, and a Deep Learning architecture to find the best answer for the multiple-choice questions based on the narrative text. The results obtained are in line with those obtained by the other participants and they encourage us to continue working on this problem.
In this paper, we describe a machine reading comprehension system that participated in SemEval-2018 Task 11: Machine Comprehension using commonsense knowledge. In this work, we train a series of neural network models such as multi-LSTM, BiLSTM, multi- BiLSTM-CNN and attention-based BiLSTM, etc. On top of some sub models, there are two kinds of word embedding: (a) general word embedding generated from unsupervised neural language model; and (b) position embedding generated from general word embedding. Finally, we make a hard vote on the predictions of these models and achieve relatively good result. The proposed approach achieves 8th place in Task 11 with the accuracy of 0.7213.
We firstly use GloVe to learn the distributed representations automatically from the instance, question and answer triples. Then an attentionbased Bidirectional LSTM (BiLSTM) model is used to encode the triples. We also perform a simple ensemble method to improve the effectiveness of our model. The system we developed obtains an encouraging result on this task. It achieves the accuracy 0.7472 on the test set. We rank 5th according to the official ranking.
This paper describes the system we submitted to the Task 11 in SemEval 2018, i.e., Machine Comprehension using Commonsense Knowledge. Given a passage and some questions that each have two candidate answers, this task requires the participate system to select out one answer meet the meaning of original text or commonsense knowledge from the candidate answers. For this task, we use a deep learning method to obtain final predict answer by calculating relevance of choices representations and question-aware document representation.
In this document we present an end-to-end machine reading comprehension system that solves multiple choice questions with a textual entailment perspective. Since some of the knowledge required is not explicitly mentioned in the text, we try to exploit commonsense knowledge by using pretrained word embeddings during contextual embeddings and by dynamically generating a weighted representation of related script knowledge. In the model two kinds of prediction structure are ensembled, and the final accuracy of our system is 10 percent higher than the naiive baseline.
This shared task is a typical question answering task. Compared with the normal question and answer system, it needs to give the answer to the question based on the text provided. The essence of the problem is actually reading comprehension. Typically, there are several questions for each text that correspond to it. And for each question, there are two candidate answers (and only one of them is correct). To solve this problem, the usual approach is to use convolutional neural networks (CNN) and recurrent neural network (RNN) or their improved models (such as long short-term memory (LSTM)). In this paper, an attention-based CNN-LSTM model is proposed for this task. By adding an attention mechanism and combining the two models, this experimental result has been significantly improved.
This paper describes our submission for the International Workshop on Semantic Evaluation (SemEval-2018) shared task 11– Machine Comprehension using Commonsense Knowledge (Ostermann et al., 2018b). We use a deep neural network model to choose the correct answer from the candidate answers pair when the document and question are given. The interactions between document, question and answers are modeled by attention mechanism and a variety of manual features are used to improve model performance. We also use CoVe (McCann et al., 2017) as an external source of knowledge which is not mentioned in the document. As a result, our system achieves 80.91% accuracy on the test data, which is on the third place of the leaderboard.
This paper describes the IUCM entry at SemEval-2018 Task 11, on machine comprehension using commonsense knowledge. First, clustering and topic modeling are used to divide given texts into topics. Then, during the answering phase, other texts of the same topic are retrieved and used as commonsense knowledge. Finally, the answer is selected. While clustering itself shows good results, finding an answer proves to be more challenging. This paper reports the results of system evaluation and suggests potential improvements.
Machine Comprehension of text is a typical Natural Language Processing task which remains an elusive challenge. This paper is to solve the task 11 of SemEval-2018, Machine Comprehension using Commonsense Knowledge task. We use deep learning model to solve the problem. We build distributed word embedding of text, question and answering respectively instead of manually extracting features by linguistic tools. Meanwhile, we use a series of frameworks such as CNN model, LSTM model, LSTM with attention model and biLSTM with attention model for processing word vector. Experiments demonstrate the superior performance of biLSTM with attention framework compared to other models. We also delete high frequency words and combine word vector and data augmentation methods, achieved a certain effect. The approach we proposed rank 6th in official results, with accuracy rate of 0.7437 in test dataset.
This paper describes MITRE’s participation in SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge. The techniques explored range from simple bag-of-ngrams classifiers to neural architectures with varied attention and alignment mechanisms. Logistic regression ties the systems together into an ensemble submitted for evaluation. The resulting system answers reading comprehension questions with 82.27% accuracy.
We present a novel neural architecture for the Argument Reasoning Comprehension task of SemEval 2018. It is a simple neural network consisting of three parts, collectively judging whether the logic built on a set of given sentences (a claim, reason, and warrant) is plausible or not. The model utilizes contextualized word vectors pre-trained on large machine translation (MT) datasets as a form of transfer learning, which can help to mitigate the lack of training data. Quantitative analysis shows that simply leveraging LSTMs trained on MT datasets outperforms several baselines and non-transferred models, achieving accuracies of about 70% on the development set and about 60% on the test set.
Reasoning is a very important topic and has many important applications in the field of natural language processing. Semantic Evaluation (SemEval) 2018 Task 12 “The Argument Reasoning Comprehension” committed to research natural language reasoning. In this task, we proposed a novel argument reasoning comprehension system, ITNLP-ARC, which use Neural Networks technology to solve this problem. In our system, the LSTM model is involved to encode both the premise sentences and the warrant sentences. The attention model is used to merge the two premise sentence vectors. Through comparing the similarity between the attention vector and each of the two warrant vectors, we choose the one with higher similarity as our system’s final answer.
This paper presents our submissions to SemEval 2018 Task 12: the Argument Reasoning Comprehension Task. We investigate an end-to-end attention-based neural network to represent the two lexically close candidate warrants. On the one hand, we extract their different parts as attention vectors to obtain distinguishable representations. On the other hand, we use their surrounds (i.e., claim, reason, debate context) as another attention vectors to get contextual representations, which work as final clues to select the correct warrant. Our model achieves 60.4% accuracy and ranks 3rd among 22 participating systems.
The Argument Reasoning Comprehension Task is a difficult challenge requiring significant language understanding and complex reasoning over world knowledge. We focus on transfer of a sentence encoder to bootstrap more complicated architectures given the small size of the dataset. Our best model uses a pre-trained BiLSTM to encode input sentences, learns task-specific features for the argument and warrants, then performs independent argument-warrant matching. This model achieves mean test set accuracy of 61.31%. Encoder transfer yields a significant gain to our best model over random initialization. Sharing parameters for independent warrant evaluation provides regularization and effectively doubles the size of the dataset. We demonstrate that regularization comes from ignoring statistical correlations between warrant positions. We also report an experiment with our best model that only matches warrants to reasons, ignoring claims. Performance is still competitive, suggesting that our model is not necessarily learning the intended task.
To comprehend an argument and fill the gap between claims and reasons, it is vital to find the implicit supporting warrants behind. In this paper, we propose a hierarchical attention model to identify the right warrant which explains why the reason stands for the claim. Our model focuses not only on the similar part between warrants and other information but also on the contradictory part between two opposing warrants. In addition, we use the ensemble method for different models. Our model achieves an accuracy of 61%, ranking second in this task. Experimental results demonstrate that our model is effective to make correct choices.
An argument is divided into two parts, the claim and the reason. To obtain a clearer conclusion, some additional explanation is required. In this task, the explanations are called warrants. This paper introduces a bi-directional long short term memory (Bi-LSTM) with an attention model to select a correct warrant from two to explain an argument. We address this question as a question-answering system. For each warrant, the model produces a probability that it is correct. Finally, the system chooses the highest correct probability as the answer. Ensemble learning is used to enhance the performance of the model. Among all of the participants, we ranked 15th on the test results.
This paper describes our participation in the SemEval-2018 Task 12 Argument Reasoning Comprehension Task which calls to develop systems that, given a reason and a claim, predict the correct warrant from two opposing options. We decided to use a deep learning architecture and combined 623 models with different hyperparameters into an ensemble. Our extensive analysis of our architecture and ensemble reveals that the decision to use an ensemble was suboptimal. Additionally, we benchmark a support vector machine as a baseline. Furthermore, we experimented with an alternative data split and achieved more stable results.
This paper describes the system submitted to SemEval-2018 Task 12 (The Argument Reasoning Comprehension Task). Enabling a computer to understand a text so that it can answer comprehension questions is still a challenging goal of NLP. We propose a Bidirectional LSTM (BiLSTM) model that reads two sentences separated by a delimiter to determine which warrant is correct. We extend this model with a neural attention mechanism that encourages the model to make reasoning over the given claims and reasons. Officially released results show that our system ranks 6th among 22 submissions to this task.
This paper describes a warrant classification system for SemEval 2018 Task 12, that attempts to learn semantic representations of reasons, claims and warrants. The system consists of 3 stacked LSTMs: one for the reason, one for the claim, and one shared Siamese Network for the 2 candidate warrants. Our main contribution is to force the embeddings into a shared feature space using vector operations, semantic similarity classification, Siamese networks, and multi-task learning. In doing so, we learn a form of generative implication, in encoding implication interrelationships between reasons, claims, and the associated correct and incorrect warrants. We augment the limited data in the task further by utilizing WordNet synonym “fuzzing”. When applied to SemEval 2018 Task 12, our system performs well on the development data, and officially ranked 8th among 21 teams.
This paper describes a classification system that participated in the SemEval-2018 Task 12: The Argument Reasoning Comprehension Task. Briefly the task can be described as that a natural language “argument” is what we have, with reason, claim, and correct and incorrect warrants, and we need to choose the correct warrant. In order to make fully understand of the semantic information of the sentences, we proposed a neural network architecture with attention mechanism to achieve this goal. Besides we try to introduce keywords into the model to improve accuracy. Finally the proposed system achieved 5th place among 22 participating systems
This paper describes our system for the SemEval-2018 Task 12: Argument Reasoning Comprehension Task. We utilize skip-thought vectors, sentence-level distributional vectors inspired by the popular word embeddings and the skip-gram model. We encode preprocessed sentences from the dataset into vectors, then perform a binary supervised classification of the warrant that justifies the use of the reason as support for the claim. We explore a few variations of the model, reaching 54.1% accuracy on the test set, which placed us 16th out of 22 teams participating in the task.
Reasoning is a crucial part of natural language argumentation. In order to comprehend an argument, we have to reconstruct and analyze its reasoning. In this task, given a natural language argument with a reason and a claim, the goal is to choose the correct implicit reasoning from two options, in order to form a reasonable structure of (Reason, Warrant, Claim). Our approach is to build distributed word embedding of reason, warrant and claim respectively, meanwhile, we use a series of frameworks such as CNN model, LSTM model, GRU with attention model and biLSTM with attention model for processing word vector. Finally, ensemble mechanism is used to integrate the results of each framework to improve the final accuracy. Experiments demonstrate superior performance of ensemble mechanism compared to each separate framework. We are the 11th in official results, the final model can reach a 0.568 accuracy rate on the test dataset.
This paper describes our system in SemEval-2018 task 12: Argument Reasoning Comprehension. The task is to select the correct warrant that explains reasoning of a particular argument consisting of a claim and a reason. The main idea of our methods is based on the assumption that the semantic composition of the reason and the warrant should be close to the semantic representation of the corresponding claim. We propose two neural network models. The first one considers two warrant candidates simultaneously, while the second one processes each candidate separately and then chooses the best one. We also incorporate sentiment polarity by assuming that there are kinds of sentiment associations between the reason, the warrant and the claim. The experiments show that the first framework is more effective and sentiment polarity is useful.
We present an approach to event coreference resolution by developing a general framework for clustering that uses supervised representation learning. We propose a neural network architecture with novel Clustering-Oriented Regularization (CORE) terms in the objective function. These terms encourage the model to create embeddings of event mentions that are amenable to clustering. We then use agglomerative clustering on these embeddings to build event coreference chains. For both within- and cross-document coreference on the ECB+ corpus, our model obtains better results than models that require significantly more pre-annotated information. This work provides insight and motivating results for a new general approach to solving coreference and clustering problems with representation learning.
Human world knowledge contains information about prototypical events and their participants and locations. In this paper, we train the first models using multi-task learning that can both predict missing event participants and also perform semantic role classification based on semantic plausibility. Our best-performing model is an improvement over the previous state-of-the-art on thematic fit modelling tasks. The event embeddings learned by the model can additionally be used effectively in an event similarity task, also outperforming the state-of-the-art.
This paper presents a collection to assess meaning components in German complex verbs, which frequently undergo meaning shifts. We use a novel strategy to obtain source and target domain characterisations via sentence generation rather than sentence annotation. A selection of arrows adds spatial directional information to the generated contexts. We provide a broad qualitative description of the dataset, and a series of standard classification experiments verifies the quantitative reliability of the presented resource. The setup for collecting the meaning components is applicable also to other languages, regarding complex verbs as well as other language-specific targets that involve meaning shifts.
Measuring the salience of a word is an essential step in numerous NLP tasks. Heuristic approaches such as tfidf have been used so far to estimate the salience of words. We propose Neural Word Salience (NWS) scores, unlike heuristics, are learnt from a corpus. Specifically, we learn word salience scores such that, using pre-trained word embeddings as the input, can accurately predict the words that appear in a sentence, given the words that appear in the sentences preceding or succeeding that sentence. Experimental results on sentence similarity prediction show that the learnt word salience scores perform comparably or better than some of the state-of-the-art approaches for representing sentences on benchmark datasets for sentence similarity, while using only a fraction of the training and prediction times required by prior methods. Moreover, our NWS scores positively correlate with psycholinguistic measures such as concreteness, and imageability implying a close connection to the salience as perceived by humans.
Automatic machine learning systems can inadvertently accentuate and perpetuate inappropriate human biases. Past work on examining inappropriate biases has largely focused on just individual systems. Further, there is no benchmark dataset for examining inappropriate biases in systems. Here for the first time, we present the Equity Evaluation Corpus (EEC), which consists of 8,640 English sentences carefully chosen to tease out biases towards certain races and genders. We use the dataset to examine 219 automatic sentiment analysis systems that took part in a recent shared task, SemEval-2018 Task 1 ‘Affect in Tweets’. We find that several of the systems show statistically significant bias; that is, they consistently provide slightly higher sentiment intensity predictions for one race or one gender. We make the EEC freely available.
This paper describes CCG/AMR, a novel grammar for semantic parsing of Abstract Meaning Representations. CCG/AMR equips Combinatory Categorial Grammar derivations with graph semantics by assigning each CCG combinator an interpretation in terms of a graph algebra. We provide an algorithm that induces a CCG/AMR from a corpus and show that it creates a compact lexicon with low ambiguity and achieves a robust coverage of 78% of the examined sentences under ideal conditions. We also identify several phenomena that affect any approach relying either on CCG or graph algebraic approaches for AMR parsing. This includes differences of representation between CCG and AMR, as well as non-compositional constructions that are not expressible through a monotonous construction process. To our knowledge, this paper provides the first analysis of these corpus issues.
The first stage of every knowledge base question answering approach is to link entities in the input question. We investigate entity linking in the context of question answering task and present a jointly optimized neural architecture for entity mention detection and entity disambiguation that models the surrounding context on different levels of granularity. We use the Wikidata knowledge base and available question answering datasets to create benchmarks for entity linking on question answering data. Our approach outperforms the previous state-of-the-art system on this data, resulting in an average 8% improvement of the final score. We further demonstrate that our model delivers a strong performance across different entity categories.
Across disciplines, researchers are eager to gain insight into empirical features of abstract vs. concrete concepts. In this work, we provide a detailed characterisation of the distributional nature of abstract and concrete words across 16,620 English nouns, verbs and adjectives. Specifically, we investigate the following questions: (1) What is the distribution of concreteness in the contexts of concrete and abstract target words? (2) What are the differences between concrete and abstract words in terms of contextual semantic diversity? (3) How does the entropy of concrete and abstract word contexts differ? Overall, our studies show consistent differences in the distributional representation of concrete and abstract words, thus challenging existing theories of cognition and providing a more fine-grained description of their nature.
Nowadays, social media have become a platform where people can easily express their opinions and emotions about any topic such as politics, movies, music, electronic products and many others. On the other hand, politicians, companies, and businesses are interested in analyzing automatically people’s opinions and emotions. In the last decade, a lot of efforts has been put into extracting sentiment polarity from texts. Recently, the focus has expanded to also cover emotion recognition from texts. In this work, we expand an existing emotion lexicon, DepecheMood, by leveraging semantic knowledge from English WordNet (EWN). We create an expanded lexicon, EmoWordNet, consisting of 67K terms aligned with EWN, almost 1.8 times the size of DepecheMood. We also evaluate EmoWordNet in an emotion recognition task using SemEval 2007 news headlines dataset and we achieve an improvement compared to the use of DepecheMood. EmoWordNet is publicly available to speed up research in the field on http://oma-project.com.
The aim of this work is to explore the possible limitations of existing methods of cross-language word embeddings evaluation, addressing the lack of correlation between intrinsic and extrinsic cross-language evaluation methods. To prove this hypothesis, we construct English-Russian datasets for extrinsic and intrinsic evaluation tasks and compare performances of 5 different cross-language models on them. The results say that the scores even on different intrinsic benchmarks do not correlate to each other. We can conclude that the use of human references as ground truth for cross-language word embeddings is not proper unless one does not understand how do native speakers process semantics in their cognition.
In this paper we analyze the use of emojis in social media with respect to gender and skin tone. By gathering a dataset of over twenty two million tweets from United States some findings are clearly highlighted after performing a simple frequency-based analysis. Moreover, we carry out a semantic analysis on the usage of emojis and their modifiers (e.g. gender and skin tone) by embedding all words, emojis and modifiers into the same vector space. Our analyses reveal that some stereotypes related to the skin color and gender seem to be reflected on the use of these modifiers. For example, emojis representing hand gestures are more widely utilized with lighter skin tones, and the usage across skin tones differs significantly. At the same time, the vector corresponding to the male modifier tends to be semantically close to emojis related to business or technology, whereas their female counterparts appear closer to emojis about love or makeup.
When we build a neural network model predicting the relationship between two sentences, the most general and intuitive approach is to use a Siamese architecture, where the sentence vectors obtained from a shared encoder is given as input to a classifier. For the classifier to work effectively, it is important to extract appropriate features from the two vectors and feed them as input. There exist several previous works that suggest heuristic-based function for matching sentence vectors, however it cannot be said that the heuristics tailored for a specific task generalize to other tasks. In this work, we propose a new matching function, ElBiS, that learns to model element-wise interaction between two vectors. From experiments, we empirically demonstrate that the proposed ElBiS matching function outperforms the concatenation-based or heuristic-based matching functions on natural language inference and paraphrase identification, while maintaining the fused representation compact.
A position paper arguing that purely graphical representations for natural language semantics lack a fundamental degree of expressiveness, and cannot deal with even basic Boolean operations like negation or disjunction. Moving from graphs to named graphs leads to representations that stand some chance of having sufficient expressive power. Named ℱℒ0 graphs are of particular interest.
Mining electronic health records for patients who satisfy a set of predefined criteria is known in medical informatics as phenotyping. Phenotyping has numerous applications such as outcome prediction, clinical trial recruitment, and retrospective studies. Supervised machine learning for phenotyping typically relies on sparse patient representations such as bag-of-words. We consider an alternative that involves learning patient representations. We develop a neural network model for learning patient representations and show that the learned representations are general enough to obtain state-of-the-art performance on a standard comorbidity detection task.
This paper shows how to take parse trees in CCG and algorithmically find the polarities of all the constituents. Our work uses the well-known polarization principle corresponding to function application, and we have extended this with principles for type raising and composition. We provide an algorithm, extending the polarity marking algorithm of van Benthem. We discuss how our system works in practice, taking input from the C&C parser.
We present a method for unsupervised lexical frame acquisition at the syntax–semantics interface. Given a set of input strings derived from dependency parses, our method generates a set of clusters that resemble lexical frame structures. Our work is motivated not only by its practical applications (e.g., to build, or expand the coverage of lexical frame databases), but also to gain linguistic insight into frame structures with respect to lexical distributions in relation to grammatical structures. We model our task using a hierarchical Bayesian network and employ tools and methods from latent variable probabilistic context free grammars (L-PCFGs) for statistical inference and parameter fitting, for which we propose a new split and merge procedure. We show that our model outperforms several baselines on a portion of the Wall Street Journal sentences that we have newly annotated for evaluation purposes.
Cross-lingual information extraction (CLIE) is an important and challenging task, especially in low resource scenarios. To tackle this challenge, we propose a training method, called Halo, which enforces the local region of each hidden state of a neural model to only generate target tokens with the same semantic structure tag. This simple but powerful technique enables a neural model to learn semantics-aware representations that are robust to noise, without introducing any extra parameter, thus yielding better generalization in both high and low resource settings.
Annotating temporal relations (TempRel) between events described in natural language is known to be labor intensive, partly because the total number of TempRels is quadratic in the number of events. As a result, only a small number of documents are typically annotated, limiting the coverage of various lexical/semantic phenomena. In order to improve existing approaches, one possibility is to make use of the readily available, partially annotated data (P as in partial) that cover more documents. However, missing annotations in P are known to hurt, rather than help, existing systems. This work is a case study in exploring various usages of P for TempRel extraction. Results show that despite missing annotations, P is still a useful supervision signal for this task within a constrained bootstrapping learning framework. The system described in this system is publicly available.
Neural word embeddings models (such as those built with word2vec) are known to have stability problems: when retraining a model with the exact same hyperparameters, words neighborhoods may change. We propose a method to estimate such variation, based on the overlap of neighbors of a given word in two models trained with identical hyperparameters. We show that this inherent variation is not negligible, and that it does not affect every word in the same way. We examine the influence of several features that are intrinsic to a word, corpus or embedding model and provide a methodology that can predict the variability (and as such, reliability) of a word representation in a semantic vector space.
Supervised distributional methods are applied successfully in lexical entailment, but recent work questioned whether these methods actually learn a relation between two words. Specifically, Levy et al. (2015) claimed that linear classifiers learn only separate properties of each word. We suggest a cheap and easy way to boost the performance of these methods by integrating multiplicative features into commonly used representations. We provide an extensive evaluation with different classifiers and evaluation setups, and suggest a suitable evaluation setup for the task, eliminating biases existing in previous ones.
We propose a practical model for named entity recognition (NER) that combines word and character-level information with a specific learned representation of the prefixes and suffixes of the word. We apply this approach to multilingual and multi-domain NER and show that it achieves state of the art results on the CoNLL 2002 Spanish and Dutch and CoNLL 2003 German NER datasets, consistently achieving 1.5-2.3 percent over the state of the art without relying on any dictionary features. Additionally, we show improvement on SemEval 2013 task 9.1 DrugNER, achieving state of the art results on the MedLine dataset and the second best results overall (-1.3% from state of the art). We also establish a new benchmark on the I2B2 2010 Clinical NER dataset with 84.70 F-score.
Fine-grained entity typing is the task of assigning fine-grained semantic types to entity mentions. We propose a neural architecture which learns a distributional semantic representation that leverages a greater amount of semantic context – both document and sentence level information – than prior work. We find that additional context improves performance, with further improvements gained by utilizing adaptive classification thresholds. Experiments show that our approach without reliance on hand-crafted features achieves the state-of-the-art results on three benchmark datasets.
We propose a hypothesis only baseline for diagnosing Natural Language Inference (NLI). Especially when an NLI dataset assumes inference is occurring based purely on the relationship between a context and a hypothesis, it follows that assessing entailment relations while ignoring the provided context is a degenerate solution. Yet, through experiments on 10 distinct NLI datasets, we find that this approach, which we refer to as a hypothesis-only model, is able to significantly outperform a majority-class baseline across a number of NLI datasets. Our analysis suggests that statistical irregularities may allow a model to perform NLI in some datasets beyond what should be achievable without access to the context.
We study the problem of measuring the quality of automatically-generated stories. We focus on the setting in which a few sentences of a story are provided and the task is to generate the next sentence (“continuation”) in the story. We seek to identify what makes a story continuation interesting, relevant, and have high overall quality. We crowdsource annotations along these three criteria for the outputs of story continuation systems, design features, and train models to predict the annotations. Our trained scorer can be used as a rich feature function for story generation, a reward function for systems that use reinforcement learning to learn to generate stories, and as a partial evaluation metric for story generation.
Existing methods of hypernymy detection mainly rely on statistics over a big corpus, either mining some co-occurring patterns like “animals such as cats” or embedding words of interest into context-aware vectors. These approaches are therefore limited by the availability of a large enough corpus that can cover all terms of interest and provide sufficient contextual information to represent their meaning. In this work, we propose a new paradigm, HyperDef, for hypernymy detection – expressing word meaning by encoding word definitions, along with context driven representation. This has two main benefits: (i) Definitional sentences express (sense-specific) corpus-independent meanings of words, hence definition-driven approaches enable strong generalization – once trained, the model is expected to work well in open-domain testbeds; (ii) Global context from a large corpus and definitions provide complementary information for words. Consequently, our model, HyperDef, once trained on task-agnostic data, gets state-of-the-art results in multiple benchmarks
Being able to predict whether people agree or disagree with an assertion (i.e. an explicit, self-contained statement) has several applications ranging from predicting how many people will like or dislike a social media post to classifying posts based on whether they are in accordance with a particular point of view. We formalize this as two NLP tasks: predicting judgments of (i) individuals and (ii) groups based on the text of the assertion and previous judgments. We evaluate a wide range of approaches on a crowdsourced data set containing over 100,000 judgments on over 2,000 assertions. We find that predicting individual judgments is a hard task with our best results only slightly exceeding a majority baseline, but that judgments of groups can be more reliably predicted using a Siamese neural network, which outperforms all other approaches by a wide margin.
Current methods for knowledge graph (KG) representation learning focus solely on the structure of the KG and do not exploit any kind of external information, such as visual and linguistic information corresponding to the KG entities. In this paper, we propose a multimodal translation-based approach that defines the energy of a KG triple as the sum of sub-energy functions that leverage both multimodal (visual and linguistic) and structural KG representations. Next, a ranking-based loss is minimized using a simple neural network architecture. Moreover, we introduce a new large-scale dataset for multimodal KG representation learning. We compared the performance of our approach to other baselines on two standard tasks, namely knowledge graph completion and triple classification, using our as well as the WN9-IMG dataset. The results demonstrate that our approach outperforms all baselines on both tasks and datasets.
While there have been many proposals for theories of semantic roles over the years, these models are mostly justified by intuition and the only evaluation methods have been inter-annotator agreement. We explore three different ideas for providing more rigorous theories of semantic roles. These ideas give rise to more objective criteria for designing role sets, and lend themselves to some experimental evaluation. We illustrate the discussion by examining the semantic roles in TRIPS.
Frame semantics is a well-established framework to represent the meaning of natural language in computational terms. In this work, we aim to propose a quantitative measure of relatedness between pairs of frame instances. We test our method on a dataset of sentence pairs, highlighting the correlation between our metric and human judgments of semantic similarity. Furthermore, we propose an application of our measure for clustering frame instances to extract prototypical knowledge from natural language.
Feature sparseness is a problem common to cross-domain and short-text classification tasks. To overcome this feature sparseness problem, we propose a novel method based on graph decomposition to find candidate features for expanding feature vectors. Specifically, we first create a feature-relatedness graph, which is subsequently decomposed into core-periphery (CP) pairs and use the peripheries as the expansion candidates of the cores. We expand both training and test instances using the computed related features and use them to train a text classifier. We observe that prioritising features that are common to both training and test instances as cores during the CP decomposition to further improve the accuracy of text classification. We evaluate the proposed CP-decomposition-based feature expansion method on benchmark datasets for cross-domain sentiment classification and short-text classification. Our experimental results show that the proposed method consistently outperforms all baselines on short-text classification tasks, and perform competitively with pivot-based cross-domain sentiment classification methods.
Words are polysemous and multi-faceted, with many shades of meanings. We suggest that sparse distributed representations are more suitable than other, commonly used, (dense) representations to express these multiple facets, and present Category Builder, a working system that, as we show, makes use of sparse representations to support multi-faceted lexical representations. We argue that the set expansion task is well suited to study these meaning distinctions since a word may belong to multiple sets with a different reason for membership in each. We therefore exhibit the performance of Category Builder on this task, while showing that our representation captures at the same time analogy problems such as “the Ganga of Egypt” or “the Voldemort of Tolkien”. Category Builder is shown to be a more expressive lexical representation and to outperform dense representations such as Word2Vec in some analogy classes despite being shown only two of the three input terms.
Tree-structured LSTMs have shown advantages in learning semantic representations by exploiting syntactic information. Most existing methods model tree structures by bottom-up combinations of constituent nodes using the same shared compositional function and often making use of input word information only. The inability to capture the richness of compositionality makes these models lack expressive power. In this paper, we propose multiplicative tree-structured LSTMs to tackle this problem. Our model makes use of not only word information but also relation information between words. It is more expressive, as different combination functions can be used for each child node. In addition to syntactic trees, we also investigate the use of Abstract Meaning Representation in tree-structured models, in order to incorporate both syntactic and semantic information from the sentence. Experimental results on common NLP tasks show the proposed models lead to better sentence representation and AMR brings benefits in complex tasks.