Erik Velldal

2021

pdf bib abs
Large-Scale Contextualised Language Modelling for Norwegian
Andrey Kutuzov | Jeremy Barnes | Erik Velldal | Lilja Øvrelid | Stephan Oepen
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training. This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo and BERT frameworks. In addition to detailing the training process, we present contrastive benchmark results on a suite of NLP tasks for Norwegian. For additional background and access to the data, models, and software, please see: http://norlm.nlpl.eu

pdf bib abs
Negation in Norwegian: an annotated dataset
Petter Mæhlum | Jeremy Barnes | Robin Kurtz | Lilja Øvrelid | Erik Velldal
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

This paper introduces NorecNeg – the first annotated dataset of negation for Norwegian. Negation cues and their in-sentence scopes have been annotated across more than 11K sentences spanning more than 400 documents for a subset of the Norwegian Review Corpus (NoReC). In addition to providing in-depth discussion of the annotation guidelines, we also present a first set of benchmark results based on a graph-parsing approach.

pdf bib abs
Multilingual ELMo and the Effects of Corpus Sampling
Vinit Ravishankar | Andrey Kutuzov | Lilja Øvrelid | Erik Velldal
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

Multilingual pretrained language models are rapidly gaining popularity in NLP systems for non-English languages. Most of these models feature an important corpus sampling step in the process of accumulating training data in different languages, to ensure that the signal from better resourced languages does not drown out poorly resourced ones. In this study, we train multiple multilingual recurrent language models, based on the ELMo architecture, and analyse both the effect of varying corpus size ratios on downstream performance, as well as the performance difference between monolingual models for each language, and broader multilingual language models. As part of this effort, we also make these trained models available for public use.

pdf bib abs
If you’ve got it, flaunt it: Making the most of fine-grained sentiment annotations
Jeremy Barnes | Lilja Øvrelid | Erik Velldal
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Fine-grained sentiment analysis attempts to extract sentiment holders, targets and polar expressions and resolve the relationship between them, but progress has been hampered by the difficulty of annotation. Targeted sentiment analysis, on the other hand, is a more narrow task, focusing on extracting sentiment targets and classifying their polarity. In this paper, we explore whether incorporating holder and expression information can improve target extraction and classification and perform experiments on eight English datasets. We conclude that jointly predicting target and polarity BIO labels improves target extraction, and that augmenting the input text with gold expressions generally improves targeted polarity classification. This highlights the potential importance of annotating expressions for fine-grained sentiment datasets. At the same time, our results show that performance of current models for predicting polar expressions is poor, hampering the benefit of this information in practice.

pdf bib abs
Using Gender- and Polarity-Informed Models to Investigate Bias
Samia Touileb | Lilja Øvrelid | Erik Velldal
Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing

In this work we explore the effect of incorporating demographic metadata in a text classifier trained on top of a pre-trained transformer language model. More specifically, we add information about the gender of critics and book authors when classifying the polarity of book reviews, and the polarity of the reviews when classifying the genders of authors and critics. We use an existing data set of Norwegian book reviews with ratings by professional critics, which has also been augmented with gender information, and train a document-level sentiment classifier on top of a recently released Norwegian BERT-model. We show that gender-informed models obtain substantially higher accuracy, and that polarity-informed models obtain higher accuracy when classifying the genders of book authors. For this particular data set, we take this result as a confirmation of the gender bias in the underlying label distribution, but in other settings we believe a similar approach can be used for mitigating bias in the model.

pdf bib abs
Structured Sentiment Analysis as Dependency Graph Parsing
Jeremy Barnes | Robin Kurtz | Stephan Oepen | Lilja Øvrelid | Erik Velldal
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Structured sentiment analysis attempts to extract full opinion tuples from a text, but over time this task has been subdivided into smaller and smaller sub-tasks, e.g., target extraction or targeted polarity classification. We argue that this division has become counterproductive and propose a new unified framework to remedy the situation. We cast the structured sentiment problem as dependency graph parsing, where the nodes are spans of sentiment holders, targets and expressions, and the arcs are the relations between them. We perform experiments on five datasets in four languages (English, Norwegian, Basque, and Catalan) and show that this approach leads to strong improvements over state-of-the-art baselines. Our analysis shows that refining the sentiment graphs with syntactic dependency information further improves results.

2020

pdf bib abs
Gender and sentiment, critics and authors: a dataset of Norwegian book reviews
Samia Touileb | Lilja Øvrelid | Erik Velldal
Proceedings of the Second Workshop on Gender Bias in Natural Language Processing

Gender bias in models and datasets is widely studied in NLP. The focus has usually been on analysing how females and males express themselves, or how females and males are described. However, a less studied aspect is the combination of these two perspectives, how female and male describe the same or opposite gender. In this paper, we present a new gender annotated sentiment dataset of critics reviewing the works of female and male authors. We investigate if this newly annotated dataset contains differences in how the works of male and female authors are critiqued, in particular in terms of positive and negative sentiment. We also explore the differences in how this is done by male and female critics. We show that there are differences in how critics assess the works of authors of the same or opposite gender. For example, male critics rate crime novels written by females, and romantic and sentimental works written by males, more negatively.

pdf bib abs
NorNE: Annotating Named Entities for Norwegian
Fredrik Jørgensen | Tobias Aasmoe | Anne-Stine Ruud Husevåg | Lilja Øvrelid | Erik Velldal
Proceedings of the 12th Language Resources and Evaluation Conference

This paper presents NorNE, a manually annotated corpus of named entities which extends the annotation of the existing Norwegian Dependency Treebank. Comprising both of the official standards of written Norwegian (Bokmål and Nynorsk), the corpus contains around 600,000 tokens and annotates a rich set of entity types including persons, organizations, locations, geo-political entities, products, and events, in addition to a class corresponding to nominals derived from names. We here present details on the annotation effort, guidelines, inter-annotator agreement and an experimental analysis of the corpus using a neural sequence labeling architecture.

pdf bib abs
A Fine-grained Sentiment Dataset for Norwegian
Lilja Øvrelid | Petter Mæhlum | Jeremy Barnes | Erik Velldal
Proceedings of the 12th Language Resources and Evaluation Conference

We here introduce NoReC_fine, a dataset for fine-grained sentiment analysis in Norwegian, annotated with respect to polar expressions, targets and holders of opinion. The underlying texts are taken from a corpus of professionally authored reviews from multiple news-sources and across a wide variety of domains, including literature, games, music, products, movies and more. We here present a detailed description of this annotation effort. We provide an overview of the developed annotation guidelines, illustrated with examples and present an analysis of inter-annotator agreement. We also report the first experimental results on the dataset, intended as a preliminary benchmark for further experiments.

2019

pdf bib abs
Probing Multilingual Sentence Representations With X-Probe
Vinit Ravishankar | Lilja Øvrelid | Erik Velldal
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

This paper extends the task of probing sentence representations for linguistic insight in a multilingual domain. In doing so, we make two contributions: first, we provide datasets for multilingual probing, derived from Wikipedia, in five languages, viz. English, French, German, Spanish and Russian. Second, we evaluate six sentence encoders for each language, each trained by mapping sentence representations to English sentence representations, using sentences in a parallel corpus. We discover that cross-lingually mapped representations are often better at retaining certain linguistic information than representations derived from English encoders trained on natural language inference (NLI) as a downstream task.

pdf bib abs
One-to-X Analogical Reasoning on Word Embeddings: a Case for Diachronic Armed Conflict Prediction from News Texts
Andrey Kutuzov | Erik Velldal | Lilja Øvrelid
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

We extend the well-known word analogy task to a one-to-X formulation, including one-to-none cases, when no correct answer exists. The task is cast as a relation discovery problem and applied to historical armed conflicts datasets, attempting to predict new relations of type ‘location:armed-group’ based on data about past events. As the source of semantic information, we use diachronic word embedding models trained on English news texts. A simple technique to improve diachronic performance in such task is demonstrated, using a threshold based on a function of cosine distance to decrease the number of false positives; this approach is shown to be beneficial on two different corpora. Finally, we publish a ready-to-use test set for one-to-X analogy evaluation on historical armed conflicts data.

pdf bib abs
Measuring Diachronic Evolution of Evaluative Adjectives with Word Embeddings: the Case for English, Norwegian, and Russian
Julia Rodina | Daria Bakshandaeva | Vadim Fomin | Andrey Kutuzov | Samia Touileb | Erik Velldal
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

We measure the intensity of diachronic semantic shifts in adjectives in English, Norwegian and Russian across 5 decades. This is done in order to test the hypothesis that evaluative adjectives are more prone to temporal semantic change. To this end, 6 different methods of quantifying semantic change are used. Frequency-controlled experimental results show that, depending on the particular method, evaluative adjectives either do not differ from other types of adjectives in terms of semantic change or appear to actually be less prone to shifting (particularly, to ‘jitter’-type shifting). Thus, in spite of many well-known examples of semantically changing evaluative adjectives (like ‘terrific’ or ‘incredible’), it seems that such cases are not specific to this particular type of words.

pdf bib abs
Sentiment Analysis Is Not Solved! Assessing and Probing Sentiment Classification
Jeremy Barnes | Lilja Øvrelid | Erik Velldal
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Neural methods for sentiment analysis have led to quantitative improvements over previous approaches, but these advances are not always accompanied with a thorough analysis of the qualitative differences. Therefore, it is not clear what outstanding conceptual challenges for sentiment analysis remain. In this work, we attempt to discover what challenges still prove a problem for sentiment classifiers for English and to provide a challenging dataset. We collect the subset of sentences that an (oracle) ensemble of state-of-the-art sentiment classifiers misclassify and then annotate them for 18 linguistic and paralinguistic phenomena, such as negation, sarcasm, modality, etc. Finally, we provide a case study that demonstrates the usefulness of the dataset to probe the performance of a given sentiment classifier with respect to linguistic phenomena.

pdf bib abs
Annotating evaluative sentences for sentiment analysis: a dataset for Norwegian
Petter Mæhlum | Jeremy Barnes | Lilja Øvrelid | Erik Velldal
Proceedings of the 22nd Nordic Conference on Computational Linguistics

This paper documents the creation of a large-scale dataset of evaluative sentences – i.e. both subjective and objective sentences that are found to be sentiment-bearing – based on mixed-domain professional reviews from various news-sources. We present both the annotation scheme and first results for classification experiments. The effort represents a step toward creating a Norwegian dataset for fine-grained sentiment analysis.

pdf bib abs
Lexicon information in neural sentiment analysis: a multi-task learning approach
Jeremy Barnes | Samia Touileb | Lilja Øvrelid | Erik Velldal
Proceedings of the 22nd Nordic Conference on Computational Linguistics

This paper explores the use of multi-task learning (MTL) for incorporating external knowledge in neural models. Specifically, we show how MTL can enable a BiLSTM sentiment classifier to incorporate information from sentiment lexicons. Our MTL set-up is shown to improve model performance (compared to a single-task set-up) on both English and Norwegian sentence-level sentiment datasets. The paper also introduces a new sentiment lexicon for Norwegian.

pdf bib abs
Multilingual Probing of Deep Pre-Trained Contextual Encoders
Vinit Ravishankar | Memduh Gökırmak | Lilja Øvrelid | Erik Velldal
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing

Encoders that generate representations based on context have, in recent years, benefited from adaptations that allow for pre-training on large text corpora. Earlier work on evaluating fixed-length sentence representations has included the use of ‘probing’ tasks, that use diagnostic classifiers to attempt to quantify the extent to which these encoders capture specific linguistic phenomena. The principle of probing has also resulted in extended evaluations that include relatively newer word-level pre-trained encoders. We build on probing tasks established in the literature and comprehensively evaluate and analyse – from a typological perspective amongst others – multilingual variants of existing encoders on probing datasets constructed for 6 non-English languages. Specifically, we probe each layer of a multiple monolingual RNN-based ELMo models, the transformer-based BERT’s cased and uncased multilingual variants, and a variant of BERT that uses a cross-lingual modelling scheme (XLM).

2018

pdf bib abs
Transfer and Multi-Task Learning for Noun–Noun Compound Interpretation
Murhaf Fares | Stephan Oepen | Erik Velldal
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper, we empirically evaluate the utility of transfer and multi-task learning on a challenging semantic classification task: semantic interpretation of noun–noun compounds. Through a comprehensive series of experiments and in-depth error analysis, we show that transfer learning via parameter initialization and multi-task learning via parameter sharing can help a neural classification model generalize over a highly skewed distribution of relations. Further, we demonstrate how dual annotation with two distinct sets of relations over the same set of compounds can be exploited to improve the overall accuracy of a neural classifier and its F1 scores on the less frequent, but more difficult relations.

pdf bib abs
Diachronic word embeddings and semantic shifts: a survey
Andrey Kutuzov | Lilja Øvrelid | Terrence Szymanski | Erik Velldal
Proceedings of the 27th International Conference on Computational Linguistics

Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications.

2017

pdf bib abs
Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants
Andrey Kutuzov | Erik Velldal | Lilja Øvrelid
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper deals with using word embedding models to trace the temporal dynamics of semantic relations between pairs of words. The set-up is similar to the well-known analogies task, but expanded with a time dimension. To this end, we apply incremental updating of the models with new training texts, including incremental vocabulary expansion, coupled with learned transformation matrices that let us map between members of the relation. The proposed approach is evaluated on the task of predicting insurgent armed groups based on geographical locations. The gold standard data for the time span 1994–2010 is extracted from the UCDP Armed Conflicts dataset. The results show that the method is feasible and outperforms the baselines, but also that important work still remains to be done.

pdf bib
Joint UD Parsing of Norwegian Bokmål and Nynorsk
Erik Velldal | Lilja Øvrelid | Petter Hohle
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
Optimizing a PoS Tagset for Norwegian Dependency Parsing
Petter Hohle | Lilja Øvrelid | Erik Velldal
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
Word vectors, reuse, and replicability: Towards a community repository of large-text resources
Murhaf Fares | Andrey Kutuzov | Stephan Oepen | Erik Velldal
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
Wordnet extension via word embeddings: Experiments on the Norwegian Wordnet
Heidi Sand | Erik Velldal | Lilja Øvrelid
Proceedings of the 21st Nordic Conference on Computational Linguistics

For decades, most self-respecting linguistic engineering initiatives have designed and implemented custom representations for various layers of, for example, morphological, syntactic, and semantic analysis. Despite occasional efforts at harmonization or even standardization, our field today is blessed with a multitude of ways of encoding and exchanging linguistic annotations of these types, both at the levels of ‘abstract syntax’, naming choices, and of course file formats. To a large degree, it is possible to work within and across design plurality by conversion, and often there may be good reasons for divergent design reflecting differences in use. However, it is likely that some abstract commonalities across choices of representation are obscured by more superficial differences, and conversely there is no obvious procedure to tease apart what actually constitute contentful vs. mere technical divergences. In this study, we seek to conceptually align three representations for common types of morpho-syntactic analysis, pinpoint what in our view constitute contentful differences, and reflect on the underlying principles and specific requirements that led to individual choices. We expect that a more in-depth understanding of these choices across designs may led to increased harmonization, or at least to more informed design of future representations.

pdf bib abs
An open-source tool for negation detection: a maximum-margin approach
Martine Enger | Erik Velldal | Lilja Øvrelid
Proceedings of the Workshop Computational Semantics Beyond Events and Roles

This paper presents an open-source toolkit for negation detection. It identifies negation cues and their corresponding scope in either raw or parsed text using maximum-margin classification. The system design draws on best practice from the existing literature on negation detection, aiming for a simple and portable system that still achieves competitive performance. Pre-trained models and experimental results are provided for English.

pdf bib abs
Tracing armed conflicts with diachronic word embedding models
Andrey Kutuzov | Erik Velldal | Lilja Øvrelid
Proceedings of the Events and Stories in the News Workshop

Recent studies have shown that word embedding models can be used to trace time-related (diachronic) semantic shifts in particular words. In this paper, we evaluate some of these approaches on the new task of predicting the dynamics of global armed conflicts on a year-to-year basis, using a dataset from the conflict research field as the gold standard and the Gigaword news corpus as the training data. The results show that much work still remains in extracting ‘cultural’ semantic shifts from diachronic word embedding models. At the same time, we present a new task complete with an evaluation set and introduce the ‘anchor words’ method which outperforms previous approaches on this set.

2016

pdf bib
Threat detection in online discussions
Aksel Wester | Lilja Øvrelid | Erik Velldal | Hugo Lewi Hammer
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Redefining part-of-speech classes with distributional semantic models
Andrey Kutuzov | Erik Velldal | Lilja Øvrelid
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

pdf bib abs
A Corpus of Clinical Practice Guidelines Annotated with the Importance of Recommendations
Jonathon Read | Erik Velldal | Marc Cavazza | Gersende Georg
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we present the Corpus of REcommendation STrength (CREST), a collection of HTML-formatted clinical guidelines annotated with the location of recommendations. Recommendations are labelled with an author-provided indicator of their strength of importance. As data was drawn from many disparate authors, we define a unified scheme of importance labels, and provide a mapping for each guideline. We demonstrate the utility of the corpus and its annotations in some initial measurements investigating the type of language constructions associated with strong and weak recommendations, and experiments into promising features for recommendation classification, both with respect to strong and weak labels, and to all labels of the unified scheme. An error analysis indicates that, while there is a strong relationship between lexical choices and strength labels, there can be substantial variance in the choices made by different authors.

2015

pdf bib
Improving cross-domain dependency parsing with dependency-derived clusters
Jostein Lien | Erik Velldal | Lilja Øvrelid
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

2014

pdf bib abs
Off-Road LAF: Encoding and Processing Annotations in NLP Workflows
Emanuele Lapponi | Erik Velldal | Stephan Oepen | Rune Lain Knudsen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The Linguistic Annotation Framework (LAF) provides an abstract data model for specifying interchange representations to ensure interoperability among different annotation formats. This paper describes an ongoing effort to adapt the LAF data model as the interchange representation in complex workflows as used in the Language Analysis Portal (LAP), an on-line and large-scale processing service that is developed as part of the Norwegian branch of the Common Language Resources and Technology Infrastructure (CLARIN) initiative. Unlike several related on-line processing environments, which predominantly instantiate a distributed architecture of web services, LAP achives scalability to potentially very large data volumes through integration with the Norwegian national e-Infrastructure, and in particular job sumission to a capacity compute cluster. This setup leads to tighter integration requirements and also calls for efficient, low-overhead communication of (intermediate) processing results with workflows. We meet these demands by coupling the LAF data model with a lean, non-redundant JSON-based interchange format and integration of an agile and performant NoSQL database, allowing parallel access from cluster nodes, as the central repository of linguistic annotation.

pdf bib
Predicting Party Affiliations from European Parliament Debates
Bjørn Høyland | Jean-François Godbout | Emanuele Lapponi | Erik Velldal
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science

In this paper we describe and evaluate different statistical models for the task of realization ranking, i.e. the problem of discriminating between competing surface realizations generated for a given input semantics. Three models are trained and tested; an n-gram language model, a discriminative maximum entropy model using structural features, and a combination of these two. Our realization component forms part of a larger, hybrid MT system.

2004

Co-authors

Venues

CL1

ACL1

Erik Velldal

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2007

2006

2005

2004

Co-authors

Venues