This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we generate only three BibTeX files per volume, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
Although it is widely agreed that world knowledge plays a significant role in quantifier scope disambiguation (QSD), there has been only very limited work on how to integrate this knowledge into a QSD model. This paper contributes to this scarce line of research by incorporating into a machine learning model our knowledge about relations, as conveyed by a manageable closed class of function words: prepositions. For data, we use a scope-disambiguated corpus created by AnderBois, Brasoveanu and Henderson, which is additionally annotated with prepositional senses using Schneider et al’s Semantic Network of Adposition and Case Supersenses (SNACS) scheme. By applying Manshadi and Allen’s method to the corpus, we were able to inspect the information gain provided by prepositions for the QSD task. Statistical analysis of the performance of the classifiers, trained in scenarios with and without preposition information, supports the claim that prepositional senses have a strong positive impact on the learnability of automatic QSD systems.
The current natural language processing is strongly focused on raising accuracy. The progress comes at a cost of super-heavy models with hundreds of millions or even billions of parameters. However, simple syntactic tasks such as part-of-speech (POS) tagging, dependency parsing or named entity recognition (NER) do not require the largest models to achieve acceptable results. In line with this assumption we try to minimize the size of the model that jointly performs all three tasks. We introduce ComboNER: a lightweight tool, orders of magnitude smaller than state-of-the-art transformers. It is based on pre-trained subword embeddings and recurrent neural network architecture. ComboNER operates on Polish language data. The model has outputs for POS tagging, dependency parsing and NER. Our paper contains some insights from fine-tuning of the model and reports its overall results.
This paper introduces the SAMSum Corpus, a new dataset with abstractive dialogue summaries. We investigate the challenges it poses for automated summarization by testing several models and comparing their results with those obtained on a corpus of news articles. We show that model-generated summaries of dialogues achieve higher ROUGE scores than the model-generated summaries of news – in contrast with human evaluators’ judgement. This suggests that a challenging task of abstractive dialogue summarization requires dedicated models and non-standard quality measures. To our knowledge, our study is the first attempt to introduce a high-quality chat-dialogues corpus, manually annotated with abstractive summarizations, which can be used by the research community for further studies.
The goal of our paper is to compare psycholinguistic text features with fact checking approaches to distinguish lies from true statements. We examine both methods using data from a large ongoing study on deception and deception detection covering a mixture of factual and opinionated topics that polarize public opinion. We conclude that fact checking approaches based on Wikipedia are too limited for this task, as only a few percent of sentences from our study has enough evidence to become supported or refuted. Psycholinguistic features turn out to outperform both fact checking and human baselines, but the accuracy is not high. Overall, it appears that deception detection applicable to less-than-obvious topics is a difficult task and a problem to be solved.
The goal of this paper is to use all available Polish language data sets to seek the best possible performance in supervised sentiment analysis of short texts. We use text collections with labelled sentiment such as tweets, movie reviews and a sentiment treebank, in three comparison modes. In the first, we examine the performance of models trained and tested on the same text collection using standard cross-validation (in-domain). In the second we train models on all available data except the given test collection, which we use for testing (one vs rest cross-domain). In the third, we train a model on one data set and apply it to another one (one vs one cross-domain). We compare wide range of methods including machine learning on bag-of-words representation, bidirectional recurrent neural networks as well as the most recent pre-trained architectures ELMO and BERT. We formulate conclusions as to cross-domain and in-domain performance of each method. Unsurprisingly, BERT turned out to be a strong performer, especially in the cross-domain setting. What is surprising however, is solid performance of the relatively simple multinomial Naive Bayes classifier, which performed equally well as BERT on several data sets.
The article describes our submission to SemEval 2019 Task 8 on Fact-Checking in Community Forums. The systems under discussion participated in Subtask A: decide whether a question asks for factual information, opinion/advice or is just socializing. Our primary submission was ranked as the second one among all participants in the official evaluation phase. The article presents our primary solution: Deeply Regularized Residual Neural Network (DRR NN) with Universal Sentence Encoder embeddings. This is followed by a description of two contrastive solutions based on ensemble methods.
The paper addresses the classification of isolated Polish adjective-noun phrases according to their metaphoricity. We tested neural networks to predict if a phrase has a literal or metaphorical sense or can have both senses depending on usage. The input to the neural network consists of word embeddings, but we also tested the impact of information about the domain of the adjective and about the abstractness of the noun. We applied our solution to English data available on the Internet and compared it to results published in papers. We found that the solution based on word embeddings only can achieve results comparable with complex solutions requiring additional information.
The paper addresses detection of figurative usage of words in English text. The chosen method was to use neural nets fed by pretrained word embeddings. The obtained results show that simple solutions, based on words embeddings only, are comparable to complex solutions, using many sources of information which are not available for languages less-studied than English.
This paper describes multiple solutions designed and tested for the problem of word-level metaphor detection. The proposed systems are all based on variants of recurrent neural network architectures. Specifically, we explore multiple sources of information: pre-trained word embeddings (Glove), a dictionary of language concreteness and a transfer learning scenario based on the states of an encoder network from neural network machine translation system. One of the architectures is based on combining all three systems: (1) Neural CRF (Conditional Random Fields), trained directly on the metaphor data set; (2) Neural Machine Translation encoder of a transfer learning scenario; (3) a neural network used to predict final labels, trained directly on the metaphor data set. Our results vary between test sets: Neural CRF standalone is the best one on submission data, while combined system scores the highest on a test subset randomly selected from training data.
In this paper we describe experiments with automated detection of metaphors in the Polish language. We focus our analysis on noun phrases composed of an adjective and a noun, and distinguish three types of expressions: with literal sense, with metaphorical sense, and expressions both literal and methaphorical (context-dependent). We propose a method of automatically recognizing expression type using word embeddings and neural networks. We evaluate multiple neural network architectures and demonstrate that the method significantly outperforms strong baselines.
This paper compares two approaches to word sense disambiguation using word embeddings trained on unambiguous synonyms. The first is unsupervised method based on computing log probability from sequences of word embedding vectors, taking into account ambiguous word senses and guessing correct sense from context. The second method is supervised. We use a multilayer neural network model to learn a context-sensitive transformation that maps an input vector of ambiguous word into an output vector representing its sense. We evaluate both methods on corpora with manual annotations of word senses from the Polish wordnet (plWordnet).
The paper contains a description of OPFI: Opinion Finder for the Polish Language, a freely available tool for opinion target extraction. The goal of the tool is opinion finding: a task of identifying tuples composed of sentiment (positive or negative) and its target (about what or whom is the sentiment expressed). OPFI is not dependent on any particular method of sentiment identification and provides a built-in sentiment dictionary as a convenient option. Technically, it contains implementations of three different modes of opinion tuple generation: one hybrid based on dependency parsing and CRF, the second based on shallow parsing and the third on deep learning, namely GRU neural network. The paper also contains a description of related language resources: two annotated treebanks and one set of tweets.
Existing approaches to classifying documents by sentiment include machine learning with features created from n-grams and part of speech. This paper explores a different approach and examines performance of one selected machine learning algorithm, Support Vector Machines, with features computed using existing lexical resources. Special attention has been paid to fine tuning of the algorithm regarding number of features. The immediate purpose of this experiment is to evaluate lexical and sentiment resources in document-level sentiment classification task. Results described in the paper are also useful to indicate how lexicon design, different language dimensions and semantic categories contribute to document-level sentiment recognition. In a less direct way (through the examination of evaluated resources), the experiment analyzes adequacy of lexemes, word senses and synsets as different possible layers for ascribing sentiment, or as candidates for sentiment carriers. The proposed approach of machine learning word category frequencies instead of n-grams and part of speech features can potentially exhibit improvements in domain independency, but this hypothesis has to be verified in future works.