Lilja Øvrelid

Also published as: Lilja Ovrelid

2022

pdf bib abs
EventGraph: Event Extraction as Semantic Graph Parsing
Huiling You | David Samuel | Samia Touileb | Lilja Øvrelid
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

Event extraction involves the detection and extraction of both the event triggers and the corresponding arguments. Existing systems often decompose event extraction into multiple subtasks, without considering their possible interactions. In this paper, we propose EventGraph, a joint framework for event extraction, which encodes events as graphs. We represent event triggers and arguments as nodes in a semantic graph. Event extraction therefore becomes a graph parsing problem, which provides the following advantages: 1) performing event detection and argument extraction jointly; 2) detecting and extracting multiple events from a piece of text; 3) capturing the complicated interaction between event arguments and triggers. Experimental results on ACE2005 show that our model is competitive to state-of-the-art systems and has substantially improved the results on argument extraction. Additionally, we create two new datasets from ACE2005 where we keep the entire text spans for event arguments, instead of just the head word(s). Our code and models will be released as open-source.

pdf abs
EventGraph at CASE 2021 Task 1: A General Graph-based Approach to Protest Event Extraction
Huiling You | David Samuel | Samia Touileb | Lilja Øvrelid
Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)

This paper presents our submission to the 2022 edition of the CASE 2021 shared task 1, subtask 4. The EventGraph system adapts an end-to-end, graph-based semantic parser to the task of Protest Event Extraction and more specifically subtask 4 on event trigger and argument extraction. We experiment with various graphs, encoding the events as either “labeled-edge” or “node-centric” graphs. We show that the “node-centric” approach yields best results overall, performing well across the three languages of the task, namely English, Spanish, and Portuguese. EventGraph is ranked 3rd for English and Portuguese, and 4th for Spanish.

We present the Norwegian Anaphora Resolution Corpus (NARC), the first publicly available corpus annotated with anaphoric relations between noun phrases for Norwegian. The paper describes the annotated data for 326 documents in Norwegian Bokmål, together with inter-annotator agreement and discussions of relevant statistics. We also present preliminary modelling results which are comparable to existing corpora for other languages, and discuss relevant problems in relation to both modelling and the annotations themselves.

pdf abs
Bootstrapping Text Anonymization Models with Distant Supervision
Anthi Papadopoulou | Pierre Lison | Lilja Øvrelid | Ildikó Pilán
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We propose a novel method to bootstrap text anonymization models based on distant supervision. Instead of requiring manually labeled training data, the approach relies on a knowledge graph expressing the background information assumed to be publicly available about various individuals. This knowledge graph is employed to automatically annotate text documents including personal data about a subset of those individuals. More precisely, the method determines which text spans ought to be masked in order to guarantee k-anonymity, assuming an adversary with access to both the text documents and the background information expressed in the knowledge graph. The resulting collection of labeled documents is then used as training data to fine-tune a pre-trained language model for text anonymization. We illustrate this approach using a knowledge graph extracted from Wikidata and short biographical texts from Wikipedia. Evaluation results with a RoBERTa-based model and a manually annotated collection of 553 summaries showcase the potential of the approach, but also unveil a number of issues that may arise if the knowledge graph is noisy or incomplete. The results also illustrate that, contrary to most sequence labeling problems, the text anonymization task may admit several alternative solutions.

This paper demonstrates how a graph-based semantic parser can be applied to the task of structured sentiment analysis, directly predicting sentiment graphs from text. We advance the state of the art on 4 out of 5 standard benchmark sets. We release the source code, models and predictions.

pdf abs
Occupational Biases in Norwegian and Multilingual Language Models
Samia Touileb | Lilja Øvrelid | Erik Velldal
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

In this paper we explore how a demographic distribution of occupations, along gender dimensions, is reflected in pre-trained language models. We give a descriptive assessment of the distribution of occupations, and investigate to what extent these are reflected in four Norwegian and two multilingual models. To this end, we introduce a set of simple bias probes, and perform five different tasks combining gendered pronouns, first names, and a set of occupations from the Norwegian statistics bureau. We show that language specific models obtain more accurate results, and are much closer to the real-world distribution of clearly gendered occupations. However, we see that none of the models have correct representations of the occupations that are demographically balanced between genders. We also discuss the importance of the training data on which the models were trained on, and argue that template-based bias probes can sometimes be fragile, and a simple alteration in a template can change a model’s behavior.

pdf abs
Contextualized embeddings for semantic change detection: Lessons learned
Andrey Kutuzov | Erik Velldal | Lilja Øvrelid
Northern European Journal of Language Technology, Volume 8

We present a qualitative analysis of the (potentially erroneous) outputs of contextualized embedding-based methods for detecting diachronic semantic change. First, we introduce an ensemble method outperforming previously described contextualized approaches. This method is used as a basis for an in-depth analysis of the degrees of semantic change predicted for English words across 5 decades. Our findings show that contextualized methods can often predict high change scores for words which are not undergoing any real diachronic semantic shift in the lexicographic sense of the term (or at least the status of these shifts is questionable). Such challenging cases are discussed in detail with examples, and their linguistic categorization is proposed. Our conclusion is that pre-trained contextualized language models are prone to confound changes in lexicographic senses and changes in contextual variance, which naturally stem from their distributional nature, but is different from the types of issues observed in methods based on static embeddings. Additionally, they often merge together syntactic and semantic aspects of lexical entities. We propose a range of possible future solutions to these issues.

pdf abs
Neural Text Sanitization with Explicit Measures of Privacy Risk
Anthi Papadopoulou | Yunhao Yu | Pierre Lison | Lilja Øvrelid
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We present a novel approach for text sanitization, which is the task of editing a document to mask all (direct and indirect) personal identifiers and thereby conceal the identity of the individuals(s) mentioned in the text. In contrast to previous work, the approach relies on explicit measures of privacy risk, making it possible to explicitly control the trade-off between privacy protection and data utility. The approach proceeds in three steps. A neural, privacy-enhanced entity recognizer is first employed to detect and classify potential personal identifiers. We then determine which entities, or combination of entities, are likely to pose a re-identification risk through a range of privacy risk assessment measures. We present three such measures of privacy risk, respectively based on (1) span probabilities derived from a BERT language model, (2) web search queries and (3) a classifier trained on labelled data. Finally, a linear optimization solver decides which entities to mask to minimize the semantic loss while simultaneously ensuring that the estimated privacy risk remains under a given threshold. We evaluate the approach both in the absence and presence of manually annotated data. Our results highlight the potential of the approach, as well as issues specific types of personal data can introduce to the process.

In this paper, we introduce the first SemEval shared task on Structured Sentiment Analysis, for which participants are required to predict all sentiment graphs in a text, where a single sentiment graph is composed of a sentiment holder, target, expression and polarity. This new shared task includes two subtracks (monolingual and cross-lingual) with seven datasets available in five languages, namely Norwegian, Catalan, Basque, Spanish and English. Participants submitted their predictions on a held-out test set and were evaluated on Sentiment Graph F1 . Overall, the task received over 200 submissions from 32 participating teams. We present the results of the 15 teams that provided system descriptions and our own expanded analysis of the test predictions.

pdf abs
Entity-Level Sentiment Analysis (ELSA): An Exploratory Task Survey
Egil Rønningstad | Erik Velldal | Lilja Øvrelid
Proceedings of the 29th International Conference on Computational Linguistics

This paper explores the task of identifying the overall sentiment expressed towards volitional entities (persons and organizations) in a document - what we refer to as Entity-Level Sentiment Analysis (ELSA). While identifying sentiment conveyed towards an entity is well researched for shorter texts like tweets, we find little to no research on this specific task for longer texts with multiple mentions and opinions towards the same entity. This lack of research would be understandable if ELSA can be derived from existing tasks and models. To assess this, we annotate a set of professional reviews for their overall sentiment towards each volitional entity in the text. We sample from data already annotated for document-level, sentence-level, and target-level sentiment in a multi-domain review corpus, and our results indicate that there is no single proxy task that provides this overall sentiment we seek for the entities at a satisfactory level of performance. We present a suite of experiments aiming to assess the contribution towards ELSA provided by document-, sentence-, and target-level sentiment analysis, and provide a discussion of their shortcomings. We show that sentiment in our dataset is expressed not only with an entity mention as target, but also towards targets with a sentiment-relevant relation to a volitional entity. In our data, these relations extend beyond anaphoric coreference resolution, and our findings call for further research of the topic. Finally, we also present a survey of previous relevant work.

2021

pdf abs
Using Gender- and Polarity-Informed Models to Investigate Bias
Samia Touileb | Lilja Øvrelid | Erik Velldal
Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing

In this work we explore the effect of incorporating demographic metadata in a text classifier trained on top of a pre-trained transformer language model. More specifically, we add information about the gender of critics and book authors when classifying the polarity of book reviews, and the polarity of the reviews when classifying the genders of authors and critics. We use an existing data set of Norwegian book reviews with ratings by professional critics, which has also been augmented with gender information, and train a document-level sentiment classifier on top of a recently released Norwegian BERT-model. We show that gender-informed models obtain substantially higher accuracy, and that polarity-informed models obtain higher accuracy when classifying the genders of book authors. For this particular data set, we take this result as a confirmation of the gender bias in the underlying label distribution, but in other settings we believe a similar approach can be used for mitigating bias in the model.

pdf abs
If you’ve got it, flaunt it: Making the most of fine-grained sentiment annotations
Jeremy Barnes | Lilja Øvrelid | Erik Velldal
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Fine-grained sentiment analysis attempts to extract sentiment holders, targets and polar expressions and resolve the relationship between them, but progress has been hampered by the difficulty of annotation. Targeted sentiment analysis, on the other hand, is a more narrow task, focusing on extracting sentiment targets and classifying their polarity. In this paper, we explore whether incorporating holder and expression information can improve target extraction and classification and perform experiments on eight English datasets. We conclude that jointly predicting target and polarity BIO labels improves target extraction, and that augmenting the input text with gold expressions generally improves targeted polarity classification. This highlights the potential importance of annotating expressions for fine-grained sentiment datasets. At the same time, our results show that performance of current models for predicting polar expressions is poor, hampering the benefit of this information in practice.

pdf abs
Structured Sentiment Analysis as Dependency Graph Parsing
Jeremy Barnes | Robin Kurtz | Stephan Oepen | Lilja Øvrelid | Erik Velldal
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Structured sentiment analysis attempts to extract full opinion tuples from a text, but over time this task has been subdivided into smaller and smaller sub-tasks, e.g., target extraction or targeted polarity classification. We argue that this division has become counterproductive and propose a new unified framework to remedy the situation. We cast the structured sentiment problem as dependency graph parsing, where the nodes are spans of sentiment holders, targets and expressions, and the arcs are the relations between them. We perform experiments on five datasets in four languages (English, Norwegian, Basque, and Catalan) and show that this approach leads to strong improvements over state-of-the-art baselines. Our analysis shows that refining the sentiment graphs with syntactic dependency information further improves results.

pdf abs
Anonymisation Models for Text Data: State of the art, Challenges and Future Directions
Pierre Lison | Ildikó Pilán | David Sanchez | Montserrat Batet | Lilja Øvrelid
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This position paper investigates the problem of automated text anonymisation, which is a prerequisite for secure sharing of documents containing sensitive information about individuals. We summarise the key concepts behind text anonymisation and provide a review of current approaches. Anonymisation methods have so far been developed in two fields with little mutual interaction, namely natural language processing and privacy-preserving data publishing. Based on a case study, we outline the benefits and limitations of these approaches and discuss a number of open challenges, such as (1) how to account for multiple types of semantic inferences, (2) how to strike a balance between disclosure risk and data utility and (3) how to evaluate the quality of the resulting anonymisation. We lay out a case for moving beyond sequence labelling models and incorporate explicit measures of disclosure risk into the text anonymisation process.

pdf bib
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Simon Dobnik | Lilja Øvrelid
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

pdf abs
Large-Scale Contextualised Language Modelling for Norwegian
Andrey Kutuzov | Jeremy Barnes | Erik Velldal | Lilja Øvrelid | Stephan Oepen
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training. This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo and BERT frameworks. In addition to detailing the training process, we present contrastive benchmark results on a suite of NLP tasks for Norwegian. For additional background and access to the data, models, and software, please see: http://norlm.nlpl.eu

pdf abs
Negation in Norwegian: an annotated dataset
Petter Mæhlum | Jeremy Barnes | Robin Kurtz | Lilja Øvrelid | Erik Velldal
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

This paper introduces NorecNeg – the first annotated dataset of negation for Norwegian. Negation cues and their in-sentence scopes have been annotated across more than 11K sentences spanning more than 400 documents for a subset of the Norwegian Review Corpus (NoReC). In addition to providing in-depth discussion of the annotation guidelines, we also present a first set of benchmark results based on a graph-parsing approach.

pdf abs
Multilingual ELMo and the Effects of Corpus Sampling
Vinit Ravishankar | Andrey Kutuzov | Lilja Øvrelid | Erik Velldal
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

Multilingual pretrained language models are rapidly gaining popularity in NLP systems for non-English languages. Most of these models feature an important corpus sampling step in the process of accumulating training data in different languages, to ensure that the signal from better resourced languages does not drown out poorly resourced ones. In this study, we train multiple multilingual recurrent language models, based on the ELMo architecture, and analyse both the effect of varying corpus size ratios on downstream performance, as well as the performance difference between monolingual models for each language, and broader multilingual language models. As part of this effort, we also make these trained models available for public use.

2020

pdf abs
Gender and sentiment, critics and authors: a dataset of Norwegian book reviews
Samia Touileb | Lilja Øvrelid | Erik Velldal
Proceedings of the Second Workshop on Gender Bias in Natural Language Processing

Gender bias in models and datasets is widely studied in NLP. The focus has usually been on analysing how females and males express themselves, or how females and males are described. However, a less studied aspect is the combination of these two perspectives, how female and male describe the same or opposite gender. In this paper, we present a new gender annotated sentiment dataset of critics reviewing the works of female and male authors. We investigate if this newly annotated dataset contains differences in how the works of male and female authors are critiqued, in particular in terms of positive and negative sentiment. We also explore the differences in how this is done by male and female critics. We show that there are differences in how critics assess the works of authors of the same or opposite gender. For example, male critics rate crime novels written by females, and romantic and sentimental works written by males, more negatively.

pdf bib abs
Building a Norwegian Lexical Resource for Medical Entity Recognition
Ildiko Pilan | Pål H. Brekke | Lilja Øvrelid
Proceedings of the LREC 2020 Workshop on Multilingual Biomedical Text Processing (MultilingualBIO 2020)

We present a large Norwegian lexical resource of categorized medical terms. The resource, which merges information from large medical databases, contains over 56,000 entries, including automatically mapped terms from a Norwegian medical dictionary. We describe the methodology behind this automatic dictionary entry mapping based on keywords and suffixes and further present the results of a manual evaluation performed on a subset by a domain expert. The evaluation indicated that ca. 80% of the mappings were correct.

Loss of consciousness, so-called syncope, is a commonly occurring symptom associated with worse prognosis for a number of heart-related diseases. We present a comparison of methods for a diagnosis classification task in Norwegian clinical notes, targeting syncope, i.e. fainting cases. We find that an often neglected baseline with keyword matching constitutes a rather strong basis, but more advanced methods do offer some improvement in classification performance, especially a convolutional neural network model. The developed pipeline is planned to be used for quantifying unregistered syncope cases in Norway.

pdf abs
A Tale of Three Parsers: Towards Diagnostic Evaluation for Meaning Representation Parsing
Maja Buljan | Joakim Nivre | Stephan Oepen | Lilja Øvrelid
Proceedings of the Twelfth Language Resources and Evaluation Conference

We discuss methodological choices in contrastive and diagnostic evaluation in meaning representation parsing, i.e. mapping from natural language utterances to graph-based encodings of its semantic structure. Drawing inspiration from earlier work in syntactic dependency parsing, we transfer and refine several quantitative diagnosis techniques for use in the context of the 2019 shared task on Meaning Representation Parsing (MRP). As in parsing proper, moving evaluation from simple rooted trees to general graphs brings along its own range of challenges. Specifically, we seek to begin to shed light on relative strenghts and weaknesses in different broad families of parsing techniques. In addition to these theoretical reflections, we conduct a pilot experiment on a selection of top-performing MRP systems and one of the five meaning representation frameworks in the shared task. Empirical results suggest that the proposed methodology can be meaningfully applied to parsing into graph-structured target representations, uncovering hitherto unknown properties of the different systems that can inform future development and cross-fertilization across approaches.

pdf abs
NorNE: Annotating Named Entities for Norwegian
Fredrik Jørgensen | Tobias Aasmoe | Anne-Stine Ruud Husevåg | Lilja Øvrelid | Erik Velldal
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper presents NorNE, a manually annotated corpus of named entities which extends the annotation of the existing Norwegian Dependency Treebank. Comprising both of the official standards of written Norwegian (Bokmål and Nynorsk), the corpus contains around 600,000 tokens and annotates a rich set of entity types including persons, organizations, locations, geo-political entities, products, and events, in addition to a class corresponding to nominals derived from names. We here present details on the annotation effort, guidelines, inter-annotator agreement and an experimental analysis of the corpus using a neural sequence labeling architecture.

pdf abs
A Fine-grained Sentiment Dataset for Norwegian
Lilja Øvrelid | Petter Mæhlum | Jeremy Barnes | Erik Velldal
Proceedings of the Twelfth Language Resources and Evaluation Conference

We here introduce NoReC_fine, a dataset for fine-grained sentiment analysis in Norwegian, annotated with respect to polar expressions, targets and holders of opinion. The underlying texts are taken from a corpus of professionally authored reviews from multiple news-sources and across a wide variety of domains, including literature, games, music, products, movies and more. We here present a detailed description of this annotation effort. We provide an overview of the developed annotation guidelines, illustrated with examples and present an analysis of inter-annotator agreement. We also report the first experimental results on the dataset, intended as a preliminary benchmark for further experiments.

2019

pdf abs
Probing Multilingual Sentence Representations With X-Probe
Vinit Ravishankar | Lilja Øvrelid | Erik Velldal
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

This paper extends the task of probing sentence representations for linguistic insight in a multilingual domain. In doing so, we make two contributions: first, we provide datasets for multilingual probing, derived from Wikipedia, in five languages, viz. English, French, German, Spanish and Russian. Second, we evaluate six sentence encoders for each language, each trained by mapping sentence representations to English sentence representations, using sentences in a parallel corpus. We discover that cross-lingually mapped representations are often better at retaining certain linguistic information than representations derived from English encoders trained on natural language inference (NLI) as a downstream task.

pdf abs
Regression or classification? Automated Essay Scoring for Norwegian
Stig Johan Berggren | Taraka Rama | Lilja Øvrelid
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper we present first results for the task of Automated Essay Scoring for Norwegian learner language. We analyze a number of properties of this task experimentally and assess (i) the formulation of the task as either regression or classification, (ii) the use of various non-neural and neural machine learning architectures with various types of input representations, and (iii) applying multi-task learning for joint prediction of essay scoring and native language identification. We find that a GRU-based attention model trained in a single-task setting performs best at the AES task.

pdf abs
One-to-X Analogical Reasoning on Word Embeddings: a Case for Diachronic Armed Conflict Prediction from News Texts
Andrey Kutuzov | Erik Velldal | Lilja Øvrelid
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

We extend the well-known word analogy task to a one-to-X formulation, including one-to-none cases, when no correct answer exists. The task is cast as a relation discovery problem and applied to historical armed conflicts datasets, attempting to predict new relations of type ‘location:armed-group’ based on data about past events. As the source of semantic information, we use diachronic word embedding models trained on English news texts. A simple technique to improve diachronic performance in such task is demonstrated, using a threshold based on a function of cosine distance to decrease the number of false positives; this approach is shown to be beneficial on two different corpora. Finally, we publish a ready-to-use test set for one-to-X analogy evaluation on historical armed conflicts data.

pdf bib abs
Sentiment Analysis Is Not Solved! Assessing and Probing Sentiment Classification
Jeremy Barnes | Lilja Øvrelid | Erik Velldal
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Neural methods for sentiment analysis have led to quantitative improvements over previous approaches, but these advances are not always accompanied with a thorough analysis of the qualitative differences. Therefore, it is not clear what outstanding conceptual challenges for sentiment analysis remain. In this work, we attempt to discover what challenges still prove a problem for sentiment classifiers for English and to provide a challenging dataset. We collect the subset of sentences that an (oracle) ensemble of state-of-the-art sentiment classifiers misclassify and then annotate them for 18 linguistic and paralinguistic phenomena, such as negation, sarcasm, modality, etc. Finally, we provide a case study that demonstrates the usefulness of the dataset to probe the performance of a given sentiment classifier with respect to linguistic phenomena.

pdf abs
Annotating evaluative sentences for sentiment analysis: a dataset for Norwegian
Petter Mæhlum | Jeremy Barnes | Lilja Øvrelid | Erik Velldal
Proceedings of the 22nd Nordic Conference on Computational Linguistics

This paper documents the creation of a large-scale dataset of evaluative sentences – i.e. both subjective and objective sentences that are found to be sentiment-bearing – based on mixed-domain professional reviews from various news-sources. We present both the annotation scheme and first results for classification experiments. The effort represents a step toward creating a Norwegian dataset for fine-grained sentiment analysis.

pdf abs
Lexicon information in neural sentiment analysis: a multi-task learning approach
Jeremy Barnes | Samia Touileb | Lilja Øvrelid | Erik Velldal
Proceedings of the 22nd Nordic Conference on Computational Linguistics

This paper explores the use of multi-task learning (MTL) for incorporating external knowledge in neural models. Specifically, we show how MTL can enable a BiLSTM sentiment classifier to incorporate information from sentiment lexicons. Our MTL set-up is shown to improve model performance (compared to a single-task set-up) on both English and Norwegian sentence-level sentiment datasets. The paper also introduces a new sentiment lexicon for Norwegian.

pdf abs
Multilingual Probing of Deep Pre-Trained Contextual Encoders
Vinit Ravishankar | Memduh Gökırmak | Lilja Øvrelid | Erik Velldal
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing

Encoders that generate representations based on context have, in recent years, benefited from adaptations that allow for pre-training on large text corpora. Earlier work on evaluating fixed-length sentence representations has included the use of ‘probing’ tasks, that use diagnostic classifiers to attempt to quantify the extent to which these encoders capture specific linguistic phenomena. The principle of probing has also resulted in extended evaluations that include relatively newer word-level pre-trained encoders. We build on probing tasks established in the literature and comprehensively evaluate and analyse – from a typological perspective amongst others – multilingual variants of existing encoders on probing datasets constructed for 6 non-English languages. Specifically, we probe each layer of a multiple monolingual RNN-based ELMo models, the transformer-based BERT’s cased and uncased multilingual variants, and a variant of BERT that uses a cross-lingual modelling scheme (XLM).

pdf abs
Reinforcement-based denoising of distantly supervised NER with partial annotation
Farhad Nooralahzadeh | Jan Tore Lønning | Lilja Øvrelid
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Existing named entity recognition (NER) systems rely on large amounts of human-labeled data for supervision. However, obtaining large-scale annotated data is challenging particularly in specific domains like health-care, e-commerce and so on. Given the availability of domain specific knowledge resources, (e.g., ontologies, dictionaries), distant supervision is a solution to generate automatically labeled training data to reduce human effort. The outcome of distant supervision for NER, however, is often noisy. False positive and false negative instances are the main issues that reduce performance on this kind of auto-generated data. In this paper, we explore distant supervision in a supervised setup. We adopt a technique of partial annotation to address false negative cases and implement a reinforcement learning strategy with a neural network policy to identify false positive instances. Our results establish a new state-of-the-art on four benchmark datasets taken from different domains and different languages. We then go on to show that our model reduces the amount of manually annotated data required to perform NER in a new domain.

2018

pdf
Evaluation of Domain-specific Word Embeddings using Knowledge Resources
Farhad Nooralahzadeh | Lilja Øvrelid | Jan Tore Lønning
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
The LIA Treebank of Spoken Norwegian Dialects
Lilja Øvrelid | Andre Kåsen | Kristin Hagen | Anders Nøklestad | Per Erik Solberg | Janne Bondi Johannessen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf abs
SIRIUS-LTG-UiO at SemEval-2018 Task 7: Convolutional Neural Networks with Shortest Dependency Paths for Semantic Relation Extraction and Classification in Scientific Papers
Farhad Nooralahzadeh | Lilja Øvrelid | Jan Tore Lønning
Proceedings of the 12th International Workshop on Semantic Evaluation

This article presents the SIRIUS-LTG-UiO system for the SemEval 2018 Task 7 on Semantic Relation Extraction and Classification in Scientific Papers. First we extract the shortest dependency path (sdp) between two entities, then we introduce a convolutional neural network (CNN) which takes the shortest dependency path embeddings as input and performs relation classification with differing objectives for each subtask of the shared task. This approach achieved overall F1 scores of 76.7 and 83.2 for relation classification on clean and noisy data, respectively. Furthermore, for combined relation extraction and classification on clean data, it obtained F1 scores of 37.4 and 33.6 for each phase. Our system ranks 3rd in all three sub-tasks of the shared task.

pdf abs
Diachronic word embeddings and semantic shifts: a survey
Andrey Kutuzov | Lilja Øvrelid | Terrence Szymanski | Erik Velldal
Proceedings of the 27th International Conference on Computational Linguistics

Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications.

pdf bib abs
The 2018 Shared Task on Extrinsic Parser Evaluation: On the Downstream Utility of English Universal Dependency Parsers
Murhaf Fares | Stephan Oepen | Lilja Øvrelid | Jari Björne | Richard Johansson
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

We summarize empirical results and tentative conclusions from the Second Extrinsic Parser Evaluation Initiative (EPE 2018). We review the basic task setup, downstream applications involved, and end-to-end results for seventeen participating teams. Based on in-depth quantitative and qualitative analysis, we correlate intrinsic evaluation results at different layers of morph-syntactic analysis with observed downstream behavior.

pdf abs
Syntactic Dependency Representations in Neural Relation Classification
Farhad Nooralahzadeh | Lilja Øvrelid
Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP

We investigate the use of different syntactic dependency representations in a neural relation classification task and compare the CoNLL, Stanford Basic and Universal Dependencies schemes. We further compare with a syntax-agnostic approach and perform an error analysis in order to gain a better understanding of the results.

pdf abs
SIRIUS-LTG: An Entity Linking Approach to Fact Extraction and Verification
Farhad Nooralahzadeh | Lilja Øvrelid
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

This article presents the SIRIUS-LTG system for the Fact Extraction and VERification (FEVER) Shared Task. It consists of three components: 1) Wikipedia Page Retrieval: First we extract the entities in the claim, then we find potential Wikipedia URI candidates for each of the entities using a SPARQL query over DBpedia 2) Sentence selection: We investigate various techniques i.e. Smooth Inverse Frequency (SIF), Word Mover’s Distance (WMD), Soft-Cosine Similarity, Cosine similarity with unigram Term Frequency Inverse Document Frequency (TF-IDF) to rank sentences by their similarity to the claim. 3) Textual Entailment: We compare three models for the task of claim classification. We apply a Decomposable Attention (DA) model (Parikh et al., 2016), a Decomposed Graph Entailment (DGE) model (Khot et al., 2018) and a Gradient-Boosted Decision Trees (TalosTree) model (Sean et al., 2017) for this task. The experiments show that the pipeline with simple Cosine Similarity using TFIDF in sentence selection along with DA model as labelling model achieves the best results on the development set (F1 evidence: 32.17, label accuracy: 59.61 and FEVER score: 0.3778). Furthermore, it obtains 30.19, 48.87 and 36.55 in terms of F1 evidence, label accuracy and FEVER score, respectively, on the test set. Our system ranks 15th among 23 participants in the shared task prior to any human-evaluation of the evidence.

pdf abs
Iterative development of family history annotation guidelines using a synthetic corpus of clinical text
Taraka Rama | Pål Brekke | Øystein Nytrø | Lilja Øvrelid
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

In this article, we describe the development of annotation guidelines for family history information in Norwegian clinical text. We make use of incrementally developed synthetic clinical text describing patients’ family history relating to cases of cardiac disease and present a general methodology which integrates the synthetically produced clinical statements and guideline development. We analyze inter-annotator agreement based on the developed guidelines and present results from experiments aimed at evaluating the validity and applicability of the annotated corpus using machine learning techniques. The resulting annotated corpus contains 477 sentences and 6030 tokens. Both the annotation guidelines and the annotated corpus are made freely available and as such constitutes the first publicly available resource of Norwegian clinical text.

Although treebanks annotated according to the guidelines of Universal Dependencies (UD) now exist for many languages, the goal of annotating the same phenomena in a cross-linguistically consistent fashion is not always met. In this paper, we investigate one phenomenon where we believe such consistency is lacking, namely expletive elements. Such elements occupy a position that is structurally associated with a core argument (or sometimes an oblique dependent), yet are non-referential and semantically void. Many UD treebanks identify at least some elements as expletive, but the range of phenomena differs between treebanks, even for closely related languages, and sometimes even for different treebanks for the same language. In this paper, we present criteria for identifying expletives that are applicable across languages and compatible with the goals of UD, give an overview of expletives as found in current UD treebanks, and present recommendations for the annotation of expletives so that more consistent annotation can be achieved in future releases.

2017

pdf bib
Joint UD Parsing of Norwegian Bokmål and Nynorsk
Erik Velldal | Lilja Øvrelid | Petter Hohle
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf
Optimizing a PoS Tagset for Norwegian Dependency Parsing
Petter Hohle | Lilja Øvrelid | Erik Velldal
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf
Wordnet extension via word embeddings: Experiments on the Norwegian Wordnet
Heidi Sand | Erik Velldal | Lilja Øvrelid
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf abs
An open-source tool for negation detection: a maximum-margin approach
Martine Enger | Erik Velldal | Lilja Øvrelid
Proceedings of the Workshop Computational Semantics Beyond Events and Roles

This paper presents an open-source toolkit for negation detection. It identifies negation cues and their corresponding scope in either raw or parsed text using maximum-margin classification. The system design draws on best practice from the existing literature on negation detection, aiming for a simple and portable system that still achieves competitive performance. Pre-trained models and experimental results are provided for English.

pdf abs
Tracing armed conflicts with diachronic word embedding models
Andrey Kutuzov | Erik Velldal | Lilja Øvrelid
Proceedings of the Events and Stories in the News Workshop

Recent studies have shown that word embedding models can be used to trace time-related (diachronic) semantic shifts in particular words. In this paper, we evaluate some of these approaches on the new task of predicting the dynamics of global armed conflicts on a year-to-year basis, using a dataset from the conflict research field as the gold standard and the Gigaword news corpus as the training data. The results show that much work still remains in extracting ‘cultural’ semantic shifts from diachronic word embedding models. At the same time, we present a new task complete with an evaluation set and introduce the ‘anchor words’ method which outperforms previous approaches on this set.

pdf bib
Downstream use of syntactic analysis: does representation matter?
Lilja Øvrelid
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories

pdf abs
Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants
Andrey Kutuzov | Erik Velldal | Lilja Øvrelid
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper deals with using word embedding models to trace the temporal dynamics of semantic relations between pairs of words. The set-up is similar to the well-known analogies task, but expanded with a time dimension. To this end, we apply incremental updating of the models with new training texts, including incremental vocabulary expansion, coupled with learned transformation matrices that let us map between members of the relation. The proposed approach is evaluated on the task of predicting insurgent armed groups based on geographical locations. The gold standard data for the time span 1994–2010 is extracted from the UCDP Armed Conflicts dataset. The results show that the method is feasible and outperforms the baselines, but also that important work still remains to be done.

2016

pdf
Redefining part-of-speech classes with distributional semantic models
Andrey Kutuzov | Erik Velldal | Lilja Øvrelid
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

pdf abs
Universal Dependencies for Norwegian
Lilja Øvrelid | Petter Hohle
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This article describes the conversion of the Norwegian Dependency Treebank to the Universal Dependencies scheme. This paper details the mapping of PoS tags, morphological features and dependency relations and provides a description of the structural changes made to NDT analyses in order to make it compliant with the UD guidelines. We further present PoS tagging and dependency parsing experiments which report first results for the processing of the converted treebank. The full converted treebank was made available with the 1.2 release of the UD treebanks.

pdf
Threat detection in online discussions
Aksel Wester | Lilja Øvrelid | Erik Velldal | Hugo Lewi Hammer
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

2015

pdf
Improving cross-domain dependency parsing with dependency-derived clusters
Jostein Lien | Erik Velldal | Lilja Øvrelid
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

2014

pdf
Sentiment classification of online political discussions: a comparison of a word-based and dependency-based method
Hugo Lewi Hammer | Per Erik Solberg | Lilja Øvrelid
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf abs
The Norwegian Dependency Treebank
Per Erik Solberg | Arne Skjærholt | Lilja Øvrelid | Kristin Hagen | Janne Bondi Johannessen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The Norwegian Dependency Treebank is a new syntactic treebank for Norwegian Bokmäl and Nynorsk with manual syntactic and morphological annotation, developed at the National Library of Norway in collaboration with the University of Oslo. It is the first publically available treebank for Norwegian. This paper presents the core principles behind the syntactic annotation and how these principles were employed in certain specific cases. We then present the selection of texts and distribution between genres, as well as the annotation process and an evaluation of the inter-annotator agreement. Finally, we present the first results of data-driven dependency parsing of Norwegian, contrasting four state-of-the-art dependency parsers trained on the treebank. The consistency and the parsability of this treebank is shown to be comparable to other large treebank initiatives.

2013

pdf
On Different Approaches to Syntactic Analysis Into Bi-Lexical Dependencies. An Empirical Comparison of Direct, PCFG-Based, and HPSG-Based Parsers
Angelina Ivanova | Stephan Oepen | Rebecca Dridan | Dan Flickinger | Lilja Øvrelid
Proceedings of the 13th International Conference on Parsing Technologies (IWPT 2013)

pdf
Survey on parsing three dependency representations for English
Angelina Ivanova | Stephan Oepen | Lilja Øvrelid
51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop

2012

pdf
Speculation and Negation: Rules, Rankers, and the Role of Syntax
Erik Velldal | Lilja Øvrelid | Jonathon Read | Stephan Oepen
Computational Linguistics, Volume 38, Issue 2 - June 2012

pdf abs
The WeSearch Corpus, Treebank, and Treecache – A Comprehensive Sample of User-Generated Content
Jonathon Read | Dan Flickinger | Rebecca Dridan | Stephan Oepen | Lilja Øvrelid
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present the WeSearch Data Collection (WDC)―a freely redistributable, partly annotated, comprehensive sample of User-Generated Content. The WDC contains data extracted from a range of genres of varying formality (user forums, product review sites, blogs and Wikipedia) and covers two different domains (NLP and Linux). In this article, we describe the data selection and extraction process, with a focus on the extraction of linguistic content from different sources. We present the format of syntacto-semantic annotations found in this resource and present initial parsing results for these data, as well as some reflections following a first round of treebanking.

pdf bib
Who Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies
Angelina Ivanova | Stephan Oepen | Lilja Øvrelid | Dan Flickinger
Proceedings of the Sixth Linguistic Annotation Workshop

pdf
UiO1: Constituent-Based Discriminative Ranking for Negation Resolution
Jonathon Read | Erik Velldal | Lilja Øvrelid | Stephan Oepen
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf
UiO 2: Sequence-labeling Negation Using Dependency Features
Emanuele Lapponi | Erik Velldal | Lilja Øvrelid | Jonathon Read
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf
Lexical Categories for Improved Parsing of Web Data
Lilja Øvrelid | Arne Skjærholt
Proceedings of COLING 2012: Posters

2010

pdf
Resolving Speculation: MaxEnt Cue Classification and Dependency-Based Scope Rules
Erik Velldal | Lilja Øvrelid | Stephan Oepen
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

pdf
Syntactic Scope Resolution in Uncertainty Analysis
Lilja Øvrelid | Erik Velldal | Stephan Oepen
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf
Informed ways of improving data-driven dependency parsing for German
Wolfgang Seeker | Bernd Bohnet | Lilja Øvrelid | Jonas Kuhn
Coling 2010: Posters

pdf abs
Towards a Large Parallel Corpus of Cleft Constructions
Gerlof Bouma | Lilja Øvrelid | Jonas Kuhn
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present our efforts to create a large-scale, semi-automatically annotated parallel corpus of cleft constructions. The corpus is intended to reduce or make more effective the manual task of finding examples of clefts in a corpus. The corpus is being developed in the context of the Collaborative Research Centre SFB 632, which is a large, interdisciplinary research initiative to study information structure, at the University of Potsdam and the Humboldt University in Berlin. The corpus is based on the Europarl corpus (version 3). We show how state-of-the-art NLP tools, like POS taggers and statistical dependency parsers, may facilitate powerful and precise searches. We argue that identifying clefts using automatically added syntactic structure annotation is ultimately to be preferred over using lower level, though more robust, extraction methods like regular expression matching. An evaluation of the extraction method for one of the languages also offers some support for this method. We end the paper by discussing the resulting corpus itself. We present some examples of interesting clefts and translational counterparts from the corpus and suggest ways of exploiting our newly created resource in the cross-linguistic study of clefts.

pdf abs
Training Parsers on Partial Trees: A Cross-language Comparison
Kathrin Spreyer | Lilja Øvrelid | Jonas Kuhn
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present a study that compares data-driven dependency parsers obtained by means of annotation projection between language pairs of varying structural similarity. We show how the partial dependency trees projected from English to Dutch, Italian and German can be exploited to train parsers for the target languages. We evaluate the parsers against manual gold standard annotations and find that the projected parsers substantially outperform our heuristic baselines by 9―25% UAS, which corresponds to a 21―43% reduction in error rate. A comparative error analysis focuses on how the projected target language parsers handle subjects, which is especially interesting for Italian as an instance of a pro-drop language. For Dutch, we further present experiments with German as an alternative source language. In both source languages, we contrast standard baseline parsers with parsers that are enhanced with the predictions from large-scale LFG grammars through a technique of parser stacking, and show that improvements of the source language parser can directly lead to similar improvements of the projected target language parser.