Joakim Nivre

2023

pdf abs
Investigating UD Treebanks via Dataset Difficulty Measures
Artur Kulmizev | Joakim Nivre
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Treebanks annotated with Universal Dependencies (UD) are currently available for over 100 languages and are widely utilized by the community. However, their inherent characteristics are hard to measure and are only partially reflected in parser evaluations via accuracy metrics like LAS. In this study, we analyze a large subset of the UD treebanks using three recently proposed accuracy-free dataset analysis methods: dataset cartography, 𝒱-information, and minimum description length. Each method provides insights about UD treebanks that would remain undetected if only LAS was considered. Specifically, we identify a number of treebanks that, despite yielding high LAS, contain very little information that is usable by a parser to surpass what can be achieved by simple heuristics. Furthermore, we make note of several treebanks that score consistently low across numerous metrics, indicating a high degree of noise or annotation inconsistency present therein.

pdf abs
On the Concept of Resource-Efficiency in NLP
Luise Dürlich | Evangelia Gogoulou | Joakim Nivre
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Resource-efficiency is a growing concern in the NLP community. But what are the resources we care about and why? How do we measure efficiency in a way that is reliable and relevant? And how do we balance efficiency and other important concerns? Based on a review of the emerging literature on the subject, we discuss different ways of conceptualizing efficiency in terms of product and cost, using a simple case study on fine-tuning and knowledge distillation for illustration. We propose a novel metric of amortized efficiency that is better suited for life-cycle analysis than existing metrics.

pdf bib
Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)
Nikolai Ilinykh | Felix Morger | Dana Dannélls | Simon Dobnik | Beáta Megyesi | Joakim Nivre
Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)

pdf abs
What Causes Unemployment? Unsupervised Causality Mining from Swedish Governmental Reports
Luise Dürlich | Joakim Nivre | Sara Stymne
Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)

Extracting statements about causality from text documents is a challenging task in the absence of annotated training data. We create a search system for causal statements about user-specified concepts by combining pattern matching of causal connectives with semantic similarity ranking, using a language model fine-tuned for semantic textual similarity. Preliminary experiments on a small test set from Swedish governmental reports show promising results in comparison to two simple baselines.

pdf abs
Low-Resource Techniques for Analysing the Rhetorical Structure of Swedish Historical Petitions
Ellinor Lindqvist | Eva Pettersson | Joakim Nivre
Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)

Natural language processing techniques can be valuable for improving and facilitating historical research. This is also true for the analysis of petitions, a source which has been relatively little used in historical research. However, limited data resources pose challenges for mainstream natural language processing approaches based on machine learning. In this paper, we explore methods for automatically segmenting petitions according to their rhetorical structure. We find that the use of rules, word embeddings, and especially keywords can give promising results for this task.

2022

pdf abs
To the Most Gracious Highness, from Your Humble Servant: Analysing Swedish 18th Century Petitions Using Text Classification
Ellinor Lindqvist | Eva Pettersson | Joakim Nivre
Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Petitions are a rich historical source, yet they have been relatively little used in historical research. In this paper, we aim to analyse Swedish texts from around the 18th century, and petitions in particular, using automatic means of text classification. We also test how text pre-processing and different feature representations affect the result, and we examine feature importance for our main class of interest - petitions. Our experiments show that the statistical algorithms NB, RF, SVM, and kNN are indeed very able to classify different genres of historical text. Further, we find that normalisation has a positive impact on classification, and that content words are particularly informative for the traditional models. A fine-tuned BERT model, fed with normalised data, outperforms all other classification experiments with a macro average F1 score at 98.8. However, using less computationally expensive methods, including feature representation with word2vec, fastText embeddings or even TF-IDF values, with a SVM classifier also show good results for both unnormalise and normalised data. In the feature importance analysis, where we obtain the features most decisive for the classification models, we find highly relevant characteristics of the petitions, namely words expressing signs of someone inferior addressing someone superior.

pdf abs
Fine-Grained Controllable Text Generation Using Non-Residual Prompting
Fredrik Carlsson | Joey Öhman | Fangyu Liu | Severine Verlinden | Joakim Nivre | Magnus Sahlgren
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The introduction of immensely large Causal Language Models (CLMs) has rejuvenated the interest in open-ended text generation. However, controlling the generative process for these Transformer-based models is at large an unsolved problem. Earlier work has explored either plug-and-play decoding strategies, or more powerful but blunt approaches such as prompting. There hence currently exists a trade-off between fine-grained control, and the capability for more expressive high-level instructions. To alleviate this trade-off, we propose an encoder-decoder architecture that enables intermediate text prompts at arbitrary time steps. We propose a resource-efficient method for converting a pre-trained CLM into this architecture, and demonstrate its potential on various experiments, including the novel task of contextualized word inclusion. Our method provides strong results on multiple experimental settings, proving itself to be both expressive and versatile.

pdf abs
The Effects of Corpus Choice and Morphosyntax on Multilingual Space Induction
Vinit Ravishankar | Joakim Nivre
Findings of the Association for Computational Linguistics: EMNLP 2022

In an effort to study the inductive biases of language models, numerous studies have attempted to use linguistically motivated tasks as a proxy of sorts, wherein performance on these tasks would imply an inductive bias towards a specific linguistic phenomenon. In this study, we attempt to analyse the inductive biases of language models with respect to natural language phenomena, in the context of building multilingual embedding spaces.We sample corpora from 2 sources in 15 languages and train language models on pseudo-bilingual variants of each corpus, created by duplicating each corpus and shifting token indices for half the resulting corpus. We evaluate the cross-lingual capabilities of these LMs, and show that while correlations with language families tend to be weak, other corpus-level characteristics, such as type-token ratio, tend to be more strongly correlated. Finally, we show that multilingual spaces can be built, albeit less effectively, even when additional destructive perturbations are applied to the training corpora, implying that (effectively) bag-of-words models also have an inductive bias that is sufficient for inducing multilingual spaces.

pdf abs
Nucleus Composition in Transition-based Dependency Parsing
Joakim Nivre | Ali Basirat | Luise Dürlich | Adam Moss
Computational Linguistics, Volume 48, Issue 4 - December 2022

Dependency-based approaches to syntactic analysis assume that syntactic structure can be analyzed in terms of binary asymmetric dependency relations holding between elementary syntactic units. Computational models for dependency parsing almost universally assume that an elementary syntactic unit is a word, while the influential theory of Lucien Tesnière instead posits a more abstract notion of nucleus, which may be realized as one or more words. In this article, we investigate the effect of enriching computational parsing models with a concept of nucleus inspired by Tesnière. We begin by reviewing how the concept of nucleus can be defined in the framework of Universal Dependencies, which has become the de facto standard for training and evaluating supervised dependency parsers, and explaining how composition functions can be used to make neural transition-based dependency parsers aware of the nuclei thus defined. We then perform an extensive experimental study, using data from 20 languages to assess the impact of nucleus composition across languages with different typological characteristics, and utilizing a variety of analytical tools including ablation, linear mixed-effects models, diagnostic classifiers, and dimensionality reduction. The analysis reveals that nucleus composition gives small but consistent improvements in parsing accuracy for most languages, and that the improvement mainly concerns the analysis of main predicates, nominal dependents, clausal dependents, and coordination structures. Significant factors explaining the rate of improvement across languages include entropy in coordination structures and frequency of certain function words, in particular determiners. Analysis using dimensionality reduction and diagnostic classifiers suggests that nucleus composition increases the similarity of vectors representing nuclei of the same syntactic type.

pdf abs
Cause and Effect in Governmental Reports: Two Data Sets for Causality Detection in Swedish
Luise Dürlich | Sebastian Reimann | Gustav Finnveden | Joakim Nivre | Sara Stymne
Proceedings of the LREC 2022 workshop on Natural Language Processing for Political Sciences

Causality detection is the task of extracting information about causal relations from text. It is an important task for different types of document analysis, including political impact assessment. We present two new data sets for causality detection in Swedish. The first data set is annotated with binary relevance judgments, indicating whether a sentence contains causality information or not. In the second data set, sentence pairs are ranked for relevance with respect to a causality query, containing a specific hypothesized cause and/or effect. Both data sets are carefully curated and mainly intended for use as test data. We describe the data sets and their annotation, including detailed annotation guidelines. In addition, we present pilot experiments on cross-lingual zero-shot and few-shot causality detection, using training data from English and German.

2021

pdf bib abs
Universal Dependencies
Marie-Catherine de Marneffe | Christopher D. Manning | Joakim Nivre | Daniel Zeman
Computational Linguistics, Volume 47, Issue 2 - June 2021

Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, which to date has been used to create treebanks for more than 100 languages. In this article, we outline the linguistic theory of the UD framework, which draws on a long tradition of typologically oriented grammatical theories. Grammatical relations between words are centrally used to explain how predicate–argument structures are encoded morphosyntactically in different languages while morphological features and part-of-speech classes give the properties of words. We argue that this theory is a good basis for crosslinguistically consistent annotation of typologically diverse languages in a way that supports computational natural language understanding as well as broader linguistic studies.

pdf abs
Syntactic Nuclei in Dependency Parsing – A Multilingual Exploration
Ali Basirat | Joakim Nivre
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Standard models for syntactic dependency parsing take words to be the elementary units that enter into dependency relations. In this paper, we investigate whether there are any benefits from enriching these models with the more abstract notion of nucleus proposed by Tesnière. We do this by showing how the concept of nucleus can be defined in the framework of Universal Dependencies and how we can use composition functions to make a transition-based dependency parser aware of this concept. Experiments on 12 languages show that nucleus composition gives small but significant improvements in parsing accuracy. Further analysis reveals that the improvement mainly concerns a small number of dependency relations, including nominal modifiers, relations of coordination, main predicates, and direct objects.

pdf abs
Attention Can Reflect Syntactic Structure (If You Let It)
Vinit Ravishankar | Artur Kulmizev | Mostafa Abdou | Anders Søgaard | Joakim Nivre
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Since the popularization of the Transformer as a general-purpose feature encoder for NLP, many studies have attempted to decode linguistic structure from its novel multi-head attention mechanism. However, much of such work focused almost exclusively on English — a language with rigid word order and a lack of inflectional morphology. In this study, we present decoding experiments for multilingual BERT across 18 languages in order to test the generalizability of the claim that dependency syntax is reflected in attention patterns. We show that full trees can be decoded above baseline accuracy from single attention heads, and that individual relations are often tracked by the same heads across languages. Furthermore, in an attempt to address recent debates about the status of attention as an explanatory mechanism, we experiment with fine-tuning mBERT on a supervised parsing objective while freezing different series of parameters. Interestingly, in steering the objective to learn explicit linguistic structure, we find much of the same structure represented in the resulting attention patterns, with interesting differences with respect to which parameters are frozen.

pdf abs
Revisiting Negation in Neural Machine Translation
Gongbo Tang | Philipp Rönchen | Rico Sennrich | Joakim Nivre
Transactions of the Association for Computational Linguistics, Volume 9

In this paper, we evaluate the translation of negation both automatically and manually, in English–German (EN–DE) and English– Chinese (EN–ZH). We show that the ability of neural machine translation (NMT) models to translate negation has improved with deeper and more advanced networks, although the performance varies between language pairs and translation directions. The accuracy of manual evaluation in EN→DE, DE→EN, EN→ZH, and ZH→EN is 95.7%, 94.8%, 93.4%, and 91.7%, respectively. In addition, we show that under-translation is the most significant error type in NMT, which contrasts with the more diverse error profile previously observed for statistical machine translation. To better understand the root of the under-translation of negation, we study the model’s information flow and training data. While our information flow analysis does not reveal any deficiencies that could be used to detect or fix the under-translation of negation, we find that negation is often rephrased during training, which could make it more difficult for the model to learn a reliable link between source and target negation. We finally conduct intrinsic analysis and extrinsic probing tasks on negation, showing that NMT models can distinguish negation and non-negation tokens very well and encode a lot of information about negation in hidden states but nevertheless leave room for improvement.

2020

pdf abs
Køpsala: Transition-Based Graph Parsing via Efficient Training and Effective Encoding
Daniel Hershcovich | Miryam de Lhoneux | Artur Kulmizev | Elham Pejhan | Joakim Nivre
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

We present Køpsala, the Copenhagen-Uppsala system for the Enhanced Universal Dependencies Shared Task at IWPT 2020. Our system is a pipeline consisting of off-the-shelf models for everything but enhanced graph parsing, and for the latter, a transition-based graph parser adapted from Che et al. (2019). We train a single enhanced parser model per language, using gold sentence splitting and tokenization for training, and rely only on tokenized surface forms and multilingual BERT for encoding. While a bug introduced just before submission resulted in a severe drop in precision, its post-submission fix would bring us to 4th place in the official ranking, according to average ELAS. Our parser demonstrates that a unified pipeline is effective for both Meaning Representation Parsing and Enhanced Universal Dependencies.

pdf abs
Do Neural Language Models Show Preferences for Syntactic Formalisms?
Artur Kulmizev | Vinit Ravishankar | Mostafa Abdou | Joakim Nivre
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recent work on the interpretability of deep neural language models has concluded that many properties of natural language syntax are encoded in their representational spaces. However, such studies often suffer from limited scope by focusing on a single language and a single linguistic formalism. In this study, we aim to investigate the extent to which the semblance of syntactic structure captured by language models adheres to a surface-syntactic or deep syntactic style of analysis, and whether the patterns are consistent across different languages. We apply a probe for extracting directed dependency trees to BERT and ELMo models trained on 13 different languages, probing for two different syntactic annotation styles: Universal Dependencies (UD), prioritizing deep syntactic relations, and Surface-Syntactic Universal Dependencies (SUD), focusing on surface structure. We find that both models exhibit a preference for UD over SUD — with interesting variations across languages and layers — and that the strength of this preference is correlated with differences in tree shape.

pdf bib
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)
Marie-Catherine de Marneffe | Miryam de Lhoneux | Joakim Nivre | Sebastian Schuster
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)

pdf abs
Universal Dependencies for Albanian
Marsida Toska | Joakim Nivre | Daniel Zeman
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)

In this paper, we introduce the first Universal Dependencies (UD) treebank for standard Albanian, consisting of 60 sentences collected from the Albanian Wikipedia, annotated with lemmas, universal part-of-speech tags, morphological features and syntactic dependencies. In addition to presenting the treebank itself, we discuss a selection of linguistic constructions in Albanian whose analysis in UD is not self-evident, including core arguments and the status of indirect objects, pronominal clitics, genitive constructions, prearticulated adjectives, and modal verbs.

pdf abs
A Tale of Three Parsers: Towards Diagnostic Evaluation for Meaning Representation Parsing
Maja Buljan | Joakim Nivre | Stephan Oepen | Lilja Øvrelid
Proceedings of the Twelfth Language Resources and Evaluation Conference

We discuss methodological choices in contrastive and diagnostic evaluation in meaning representation parsing, i.e. mapping from natural language utterances to graph-based encodings of its semantic structure. Drawing inspiration from earlier work in syntactic dependency parsing, we transfer and refine several quantitative diagnosis techniques for use in the context of the 2019 shared task on Meaning Representation Parsing (MRP). As in parsing proper, moving evaluation from simple rooted trees to general graphs brings along its own range of challenges. Specifically, we seek to begin to shed light on relative strenghts and weaknesses in different broad families of parsing techniques. In addition to these theoretical reflections, we conduct a pilot experiment on a selection of top-performing MRP systems and one of the five meaning representation frameworks in the shared task. Empirical results suggest that the proposed methodology can be meaningfully applied to parsing into graph-structured target representations, uncovering hitherto unknown properties of the different systems that can inform future development and cross-fertilization across approaches.

Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation consists in a linguistically motivated word segmentation; a morphological layer comprising lemmas, universal part-of-speech tags, and standardized morphological features; and a syntactic layer focusing on syntactic relations between predicates, arguments and modifiers. In this paper, we describe version 2 of the universal guidelines (UD v2), discuss the major changes from UD v1 to UD v2, and give an overview of the currently available treebanks for 90 languages.

pdf bib abs
What Should/Do/Can LSTMs Learn When Parsing Auxiliary Verb Constructions?
Miryam de Lhoneux | Sara Stymne | Joakim Nivre
Computational Linguistics, Volume 46, Issue 4 - December 2020

There is a growing interest in investigating what neural NLP models learn about language. A prominent open question is the question of whether or not it is necessary to model hierarchical structure. We present a linguistic investigation of a neural parser adding insights to this question. We look at transitivity and agreement information of auxiliary verb constructions (AVCs) in comparison to finite main verbs (FMVs). This comparison is motivated by theoretical work in dependency grammar and in particular the work of Tesnière (1959), where AVCs and FMVs are both instances of a nucleus, the basic unit of syntax. An AVC is a dissociated nucleus; it consists of at least two words, and an FMV is its non-dissociated counterpart, consisting of exactly one word. We suggest that the representation of AVCs and FMVs should capture similar information. We use diagnostic classifiers to probe agreement and transitivity information in vectors learned by a transition-based neural parser in four typologically different languages. We find that the parser learns different information about AVCs and FMVs if only sequential models (BiLSTMs) are used in the architecture but similar information when a recursive layer is used. We find explanations for why this is the case by looking closely at how information is learned in the network and looking at what happens with different dependency representations of AVCs. We conclude that there may be benefits to using a recursive layer in dependency parsing and that we have not yet found the best way to integrate it in our parsers.

pdf abs
Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English
Gongbo Tang | Rico Sennrich | Joakim Nivre
Proceedings of the 28th International Conference on Computational Linguistics

Recent work has shown that deeper character-based neural machine translation (NMT) models can outperform subword-based models. However, it is still unclear what makes deeper character-based models successful. In this paper, we conduct an investigation into pure character-based models in the case of translating Finnish into English, including exploring the ability to learn word senses and morphological inflections and the attention mechanism. We demonstrate that word-level information is distributed over the entire character sequence rather than over a single character, and characters at different positions play different roles in learning linguistic knowledge. In addition, character-based models need more layers to encode word senses which explains why only deeper models outperform subword-based models. The attention distribution pattern shows that separators attract a lot of attention and we explore a sparse word-level attention to enforce character hidden states to capture the full word-level information. Experimental results show that the word-level attention with a single head results in 1.2 BLEU points drop.

2019

pdf
How to Parse Low-Resource Languages: Cross-Lingual Parsing, Target Language Annotation, or Both?
Ailsa Meechan-Maddon | Joakim Nivre
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)

pdf abs
Encoders Help You Disambiguate Word Senses in Neural Machine Translation
Gongbo Tang | Rico Sennrich | Joakim Nivre
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Neural machine translation (NMT) has achieved new state-of-the-art performance in translating ambiguous words. However, it is still unclear which component dominates the process of disambiguation. In this paper, we explore the ability of NMT encoders and decoders to disambiguate word senses by evaluating hidden states and investigating the distributions of self-attention. We train a classifier to predict whether a translation is correct given the representation of an ambiguous noun. We find that encoder hidden states outperform word embeddings significantly which indicates that encoders adequately encode relevant information for disambiguation into hidden states. In contrast to encoders, the effect of decoder is different in models with different architectures. Moreover, the attention weights and attention entropy show that self-attention can detect ambiguous nouns and distribute more attention to the context.

pdf abs
Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing - A Tale of Two Parsers Revisited
Artur Kulmizev | Miryam de Lhoneux | Johannes Gontrum | Elena Fano | Joakim Nivre
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Transition-based and graph-based dependency parsers have previously been shown to have complementary strengths and weaknesses: transition-based parsers exploit rich structural features but suffer from error propagation, while graph-based parsers benefit from global optimization but have restricted feature scope. In this paper, we show that, even though some details of the picture have changed after the switch to neural networks and continuous representations, the basic trade-off between rich features and global optimization remains essentially the same. Moreover, we show that deep contextualized word embeddings, which allow parsers to pack information about global sentence structure into local feature representations, benefit transition-based parsers more than graph-based parsers, making the two approaches virtually equivalent in terms of both accuracy and error profile. We argue that the reason is that these representations help prevent search errors and thereby allow transition-based parsers to better exploit their inherent strength of making accurate local decisions. We support this explanation by an error analysis of parsing experiments on 13 languages.

pdf abs
Understanding Neural Machine Translation by Simplification: The Case of Encoder-free Models
Gongbo Tang | Rico Sennrich | Joakim Nivre
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In this paper, we try to understand neural machine translation (NMT) via simplifying NMT architectures and training encoder-free NMT models. In an encoder-free model, the sums of word embeddings and positional embeddings represent the source. The decoder is a standard Transformer or recurrent neural network that directly attends to embeddings via attention mechanisms. Experimental results show (1) that the attention mechanism in encoder-free models acts as a strong feature extractor, (2) that the word embeddings in encoder-free models are competitive to those in conventional models, (3) that non-contextualized source representations lead to a big performance drop, and (4) that encoder-free models have different effects on alignment quality for German-English and Chinese-English.

pdf abs
Recursive Subtree Composition in LSTM-Based Dependency Parsing
Miryam de Lhoneux | Miguel Ballesteros | Joakim Nivre
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The need for tree structure modelling on top of sequence modelling is an open issue in neural dependency parsing. We investigate the impact of adding a tree layer on top of a sequential model by recursively composing subtree representations (composition) in a transition-based parser that uses features extracted by a BiLSTM. Composition seems superfluous with such a model, suggesting that BiLSTMs capture information about subtrees. We perform model ablations to tease out the conditions under which composition helps. When ablating the backward LSTM, performance drops and composition does not recover much of the gap. When ablating the forward LSTM, performance drops less dramatically and composition recovers a substantial part of the gap, indicating that a forward LSTM and composition capture similar information. We take the backward LSTM to be related to lookahead features and the forward LSTM to the rich history-based features both crucial for transition-based parsers. To capture history-based information, composition is better than a forward LSTM on its own, but it is even better to have a forward LSTM as part of a BiLSTM. We correlate results with language properties, showing that the improved lookahead of a backward LSTM is especially important for head-final languages.

2018

Although treebanks annotated according to the guidelines of Universal Dependencies (UD) now exist for many languages, the goal of annotating the same phenomena in a cross-linguistically consistent fashion is not always met. In this paper, we investigate one phenomenon where we believe such consistency is lacking, namely expletive elements. Such elements occupy a position that is structurally associated with a core argument (or sometimes an oblique dependent), yet are non-referential and semantically void. Many UD treebanks identify at least some elements as expletive, but the range of phenomena differs between treebanks, even for closely related languages, and sometimes even for different treebanks for the same language. In this paper, we present criteria for identifying expletives that are applicable across languages and compatible with the goals of UD, give an overview of expletives as found in current UD treebanks, and present recommendations for the annotation of expletives so that more consistent annotation can be achieved in future releases.

We evaluate two cross-lingual techniques for adding enhanced dependencies to existing treebanks in Universal Dependencies. We apply a rule-based system developed for English and a data-driven system trained on Finnish to Swedish and Italian. We find that both systems are accurate enough to bootstrap enhanced dependencies in existing UD treebanks. In the case of Italian, results are even on par with those of a prototype language-specific system.

pdf abs
An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation
Gongbo Tang | Rico Sennrich | Joakim Nivre
Proceedings of the Third Conference on Machine Translation: Research Papers

Recent work has shown that the encoder-decoder attention mechanisms in neural machine translation (NMT) are different from the word alignment in statistical machine translation. In this paper, we focus on analyzing encoder-decoder attention mechanisms, in the case of word sense disambiguation (WSD) in NMT models. We hypothesize that attention mechanisms pay more attention to context tokens when translating ambiguous words. We explore the attention distribution patterns when translating ambiguous nouns. Counterintuitively, we find that attention mechanisms are likely to distribute more attention to the ambiguous noun itself rather than context tokens, in comparison to other nouns. We conclude that attention is not the main mechanism used by NMT models to incorporate contextual information for WSD. The experimental results suggest that NMT models learn to encode contextual information necessary for WSD in the encoder hidden states. For the attention mechanism in Transformer models, we reveal that the first few layers gradually learn to “align” source and target tokens and the last few layers learn to extract features from the related but unaligned context tokens.

pdf abs
An Investigation of the Interactions Between Pre-Trained Word Embeddings, Character Models and POS Tags in Dependency Parsing
Aaron Smith | Miryam de Lhoneux | Sara Stymne | Joakim Nivre
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We provide a comprehensive analysis of the interactions between pre-trained word embeddings, character models and POS tags in a transition-based dependency parser. While previous studies have shown POS information to be less important in the presence of character models, we show that in fact there are complex interactions between all three techniques. In isolation each produces large improvements over a baseline system using randomly initialised word embeddings only, but combining them quickly leads to diminishing returns. We categorise words by frequency, POS tag and language in order to systematically investigate how each of the techniques affects parsing quality. For many word categories, applying any two of the three techniques is almost as good as the full combined system. Character models tend to be more important for low-frequency open-class words, especially in morphologically rich languages, while POS tags can help disambiguate high-frequency function words. We also show that large character embedding sizes help even for languages with small character sets, especially in morphologically rich languages.

pdf abs
Sentences with Gapping: Parsing and Reconstructing Elided Predicates
Sebastian Schuster | Joakim Nivre | Christopher D. Manning
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Sentences with gapping, such as Paul likes coffee and Mary tea, lack an overt predicate to indicate the relation between two or more arguments. Surface syntax representations of such sentences are often produced poorly by parsers, and even if correct, not well suited to downstream natural language understanding tasks such as relation extraction that are typically designed to extract information from sentences with canonical clause structure. In this paper, we present two methods for parsing to a Universal Dependencies graph representation that explicitly encodes the elided material with additional nodes and edges. We find that both methods can reconstruct elided material from dependency trees with high accuracy when the parser correctly predicts the existence of a gap. We further demonstrate that one of our methods can be applied to other languages based on a case study on Swedish.

Every year, the Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2018, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on test input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. This shared task constitutes a 2nd edition—the first one took place in 2017 (Zeman et al., 2017); the main metric from 2017 has been kept, allowing for easy comparison, also in 2018, and two new main metrics have been used. New datasets added to the Universal Dependencies collection between mid-2017 and the spring of 2018 have contributed to increased difficulty of the task this year. In this overview paper, we define the task and the updated evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

pdf abs
82 Treebanks, 34 Models: Universal Dependency Parsing with Multi-Treebank Models
Aaron Smith | Bernd Bohnet | Miryam de Lhoneux | Joakim Nivre | Yan Shao | Sara Stymne
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

We present the Uppsala system for the CoNLL 2018 Shared Task on universal dependency parsing. Our system is a pipeline consisting of three components: the first performs joint word and sentence segmentation; the second predicts part-of-speech tags and morphological features; the third predicts dependency trees from words and tags. Instead of training a single parsing model for each treebank, we trained models with multiple treebanks for one language or closely related languages, greatly reducing the number of models. On the official test run, we ranked 7th of 27 teams for the LAS and MLAS metrics. Our system obtained the best scores overall for word segmentation, universal POS tagging, and morphological features.

pdf abs
Universal Word Segmentation: Implementation and Interpretation
Yan Shao | Christian Hardmeier | Joakim Nivre
Transactions of the Association for Computational Linguistics, Volume 6

Word segmentation is a low-level NLP task that is non-trivial for a considerable number of languages. In this paper, we present a sequence tagging framework and apply it to word segmentation for a wide range of languages with different writing systems and typological characteristics. Additionally, we investigate the correlations between various typological factors and word segmentation accuracy. The experimental results indicate that segmentation accuracy is positively related to word boundary markers and negatively to the number of unique non-segmental terms. Based on the analysis, we design a small set of language-specific settings and extensively evaluate the segmentation system on the Universal Dependencies datasets. Our model obtains state-of-the-art accuracies on all the UD languages. It performs substantially better on languages that are non-trivial to segment, such as Chinese, Japanese, Arabic and Hebrew, when compared to previous work.

pdf abs
Parser Training with Heterogeneous Treebanks
Sara Stymne | Miryam de Lhoneux | Aaron Smith | Joakim Nivre
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

How to make the most of multiple heterogeneous treebanks when training a monolingual dependency parser is an open question. We start by investigating previously suggested, but little evaluated, strategies for exploiting multiple treebanks based on concatenating training sets, with or without fine-tuning. We go on to propose a new method based on treebank embeddings. We perform experiments for several languages and show that in many cases fine-tuning and treebank embeddings lead to substantial improvements over single treebanks or concatenation, with average gains of 2.0–3.5 LAS points. We argue that treebank embeddings should be preferred due to their conceptual simplicity, flexibility and extensibility.

pdf abs
An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization
Gongbo Tang | Fabienne Cap | Eva Pettersson | Joakim Nivre
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, we apply different NMT models to the problem of historical spelling normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The NMT models are at different levels, have different attention mechanisms, and different neural network architectures. Our results show that NMT models are much better than SMT models in terms of character error rate. The vanilla RNNs are competitive to GRUs/LSTMs in historical spelling normalization. Transformer models perform better only when provided with more training data. We also find that subword-level models with a small subword vocabulary are better than character-level models. In addition, we propose a hybrid method which further improves the performance of historical spelling normalization.

2017

pdf abs
Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF
Yan Shao | Christian Hardmeier | Jörg Tiedemann | Joakim Nivre
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We present a character-based model for joint segmentation and POS tagging for Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is adapted and applied with novel vector representations of Chinese characters that capture rich contextual information and lower-than-character level features. The proposed model is extensively evaluated and compared with a state-of-the-art tagger respectively on CTB5, CTB9 and UD Chinese. The experimental results indicate that our model is accurate and robust across datasets in different sizes, genres and annotation schemes. We obtain state-of-the-art performance on CTB5, achieving 94.38 F1-score for joint segmentation and POS tagging.

pdf abs
Recall is the Proper Evaluation Metric for Word Segmentation
Yan Shao | Christian Hardmeier | Joakim Nivre
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

We extensively analyse the correlations and drawbacks of conventionally employed evaluation metrics for word segmentation. Unlike in standard information retrieval, precision favours under-splitting systems and therefore can be misleading in word segmentation. Overall, based on both theoretical and experimental analysis, we propose that precision should be excluded from the standard evaluation metrics and that the evaluation score obtained by using only recall is sufficient and better correlated with the performance of word segmentation systems.

pdf bib abs
Universal Dependencies
Joakim Nivre | Daniel Zeman | Filip Ginter | Francis Tyers
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts

Universal Dependencies (UD) is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages. This tutorial gives an introduction to the UD framework and resources, from basic design principles to annotation guidelines and existing treebanks. We also discuss tools for developing and exploiting UD treebanks and survey applications of UD in NLP and linguistics.

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

We present the Uppsala submission to the CoNLL 2017 shared task on parsing from raw text to universal dependencies. Our system is a simple pipeline consisting of two components. The first performs joint word and sentence segmentation on raw text; the second predicts dependency trees from raw words. The parser bypasses the need for part-of-speech tagging, but uses word embeddings based on universal tag distributions. We achieved a macro-averaged LAS F1 of 65.11 in the official test run, which improved to 70.49 after bug fixes. We obtained the 2nd best result for sentence segmentation with a score of 89.03.

pdf
Real-valued Syntactic Word Vectors (RSV) for Greedy Neural Dependency Parsing
Ali Basirat | Joakim Nivre
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf
Machine Learning for Rhetorical Figure Detection: More Chiasmus with Less Annotation
Marie Dubremetz | Joakim Nivre
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017)
Marie-Catherine de Marneffe | Joakim Nivre | Sebastian Schuster
Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017)

pdf
Universal Dependency Evaluation
Joakim Nivre | Chiao-Ting Fang
Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017)

pdf abs
Arc-Hybrid Non-Projective Dependency Parsing with a Static-Dynamic Oracle
Miryam de Lhoneux | Sara Stymne | Joakim Nivre
Proceedings of the 15th International Conference on Parsing Technologies

In this paper, we extend the arc-hybrid system for transition-based parsing with a swap transition that enables reordering of the words and construction of non-projective trees. Although this extension breaks the arc-decomposability of the transition system, we show how the existing dynamic oracle for this system can be modified and combined with a static oracle only for the swap transition. Experiments on 5 languages show that the new system gives competitive accuracy and is significantly better than a system trained with a purely static oracle.

pdf bib
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)
Simonetta Montemagni | Joakim Nivre
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

2016

pdf
Syntax Matters for Rhetorical Structure: The Case of Chiasmus
Marie Dubremetz | Joakim Nivre
Proceedings of the Fifth Workshop on Computational Linguistics for Literature

pdf bib
Should Have, Would Have, Could Have. Investigating Verb Group Representations for Parsing with Universal Dependencies.
Miryam de Lhoneux | Joakim Nivre
Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP

pdf
Applying Neural Networks to English-Chinese Named Entity Transliteration
Yan Shao | Joakim Nivre
Proceedings of the Sixth Named Entity Workshop

pdf abs
Universal Dependencies: A Cross-Linguistic Perspective on Grammar and Lexicon
Joakim Nivre
Proceedings of the Workshop on Grammar and Lexicon: interactions and interfaces (GramLex)

Universal Dependencies is an initiative to develop cross-linguistically consistent grammatical annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning and parsing research from a language typology perspective. It assumes a dependency-based approach to syntax and a lexicalist approach to morphology, which together entail that the fundamental units of grammatical annotation are words. Words have properties captured by morphological annotation and enter into relations captured by syntactic annotation. Moreover, priority is given to relations between lexical content words, as opposed to grammatical function words. In this position paper, I discuss how this approach allows us to capture similarities and differences across typologically diverse languages.

The Universal Dependencies (UD) project was conceived after the substantial recent interest in unifying annotation schemes across languages. With its own annotation principles and abstract inventory for parts of speech, morphosyntactic features and dependency relations, UD aims to facilitate multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. This paper presents the Turkish IMST-UD Treebank, the first Turkish treebank to be in a UD release. The IMST-UD Treebank was automatically converted from the IMST Treebank, which was also recently released. We describe this conversion procedure in detail, complete with mapping tables. We also present our evaluation of the parsing performances of both versions of the IMST Treebank. Our findings suggest that the UD framework is at least as viable for Turkish as the original annotation framework of the IMST Treebank.

pdf
A Transition-Based System for Joint Lexical and Syntactic Analysis
Matthieu Constant | Joakim Nivre
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf abs
The Universal Dependencies Treebank of Spoken Slovenian
Kaja Dobrovoljc | Joakim Nivre
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents the construction of an open-source dependency treebank of spoken Slovenian, the first syntactically annotated collection of spontaneous speech in Slovenian. The treebank has been manually annotated using the Universal Dependencies annotation scheme, a one-layer syntactic annotation scheme with a high degree of cross-modality, cross-framework and cross-language interoperability. In this original application of the scheme to spoken language transcripts, we address a wide spectrum of syntactic particularities in speech, either by extending the scope of application of existing universal labels or by proposing new speech-specific extensions. The initial analysis of the resulting treebank and its comparison with the written Slovenian UD treebank confirms significant syntactic differences between the two language modalities, with spoken data consisting of shorter and more elliptic sentences, less and simpler nominal phrases, and more relations marking disfluencies, interaction, deixis and modality.

Cross-linguistically consistent annotation is necessary for sound comparative evaluation and cross-lingual learning experiments. It is also useful for multilingual system development and comparative linguistic studies. Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. In this paper, we describe v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages.

pdf abs
Universal Dependencies for Persian
Mojgan Seraji | Filip Ginter | Joakim Nivre
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The Persian Universal Dependency Treebank (Persian UD) is a recent effort of treebanking Persian with Universal Dependencies (UD), an ongoing project that designs unified and cross-linguistically valid grammatical representations including part-of-speech tags, morphological features, and dependency relations. The Persian UD is the converted version of the Uppsala Persian Dependency Treebank (UPDT) to the universal dependencies framework and consists of nearly 6,000 sentences and 152,871 word tokens with an average sentence length of 25 words. In addition to the universal dependencies syntactic annotation guidelines, the two treebanks differ in tokenization. All words containing unsegmented clitics (pronominal and copula clitics) annotated with complex labels in the UPDT have been separated from the clitics and appear with distinct labels in the Persian UD. The treebank has its original syntactic annotation scheme based on Stanford Typed Dependencies. In this paper, we present the approaches taken in the development of the Persian UD.

2015

pdf
Rhetorical Figure Detection: the Case of Chiasmus
Marie Dubremetz | Joakim Nivre
Proceedings of the Fourth Workshop on Computational Linguistics for Literature

pdf
A Multiword Expression Data Set: Annotating Non-Compositionality and Conventionalization for English Noun Compounds
Meghdad Farahmand | Aaron Smith | Joakim Nivre
Proceedings of the 11th Workshop on Multiword Expressions

pdf
Modeling the Statistical Idiosyncrasy of Multiword Expressions
Meghdad Farahmand | Joakim Nivre
Proceedings of the 11th Workshop on Multiword Expressions

pdf
Improving Verb Phrase Extraction from Historical Text by use of Verb Valency Frames
Eva Pettersson | Joakim Nivre
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)
Joakim Nivre | Eva Hajičová
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf
ParsPer: A Dependency Parser for Persian
Mojgan Seraji | Bernd Bohnet | Joakim Nivre
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf
Non-Deterministic Oracles for Unrestricted Non-Projective Transition-Based Dependency Parsing
Anders Björkelund | Joakim Nivre
Proceedings of the 14th International Conference on Parsing Technologies

pdf
Ranking Relevant Verb Phrases Extracted from Historical Text
Eva Pettersson | Beáta Megyesi | Joakim Nivre
Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

pdf
Boosting English-Chinese Machine Transliteration via High Quality Alignment and Multilingual Resources
Yan Shao | Jörg Tiedemann | Joakim Nivre
Proceedings of the Fifth Named Entity Workshop

2014

pdf bib
Squibs: Constrained Arc-Eager Dependency Parsing
Joakim Nivre | Yoav Goldberg | Ryan McDonald
Computational Linguistics, Volume 40, Issue 2 - June 2014

pdf bib
Squibs: Arc-Eager Parsing with the Tree Constraint
Joakim Nivre | Daniel Fernández-González
Computational Linguistics, Volume 40, Issue 2 - June 2014

pdf
A Multilingual Evaluation of Three Spelling Normalisation Methods for Historical Text
Eva Pettersson | Beáta Megyesi | Joakim Nivre
Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

pdf
Extraction of Nominal Multiword Expressions in French
Marie Dubremetz | Joakim Nivre
Proceedings of the 10th Workshop on Multiword Expressions (MWE)

pdf
Paraphrasing Swedish Compound Nouns in Machine Translation
Edvin Ullman | Joakim Nivre
Proceedings of the 10th Workshop on Multiword Expressions (MWE)

pdf
Issues in Translating Verb-Particle Constructions from German to English
Nina Schottmüller | Joakim Nivre
Proceedings of the 10th Workshop on Multiword Expressions (MWE)

pdf
Adventures in Multilingual Parsing
Joakim Nivre
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra)

pdf
Treebank Translation for Cross-Lingual Parser Induction
Jörg Tiedemann | Željko Agić | Joakim Nivre
Proceedings of the Eighteenth Conference on Computational Natural Language Learning

pdf
Anaphora Models and Reordering for Phrase-Based SMT
Christian Hardmeier | Sara Stymne | Jörg Tiedemann | Aaron Smith | Joakim Nivre
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf
Estimating Word Alignment Quality for SMT Reordering Tasks
Sara Stymne | Jörg Tiedemann | Joakim Nivre
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf abs
Universal Stanford dependencies: A cross-linguistic typology
Marie-Catherine de Marneffe | Timothy Dozat | Natalia Silveira | Katri Haverinen | Filip Ginter | Joakim Nivre | Christopher D. Manning
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Revisiting the now de facto standard Stanford dependency representation, we propose an improved taxonomy to capture grammatical relations across languages, including morphologically rich ones. We suggest a two-layered taxonomy: a set of broadly attested universal grammatical relations, to which language-specific relations can be added. We emphasize the lexicalist stance of the Stanford Dependencies, which leads to a particular, partially new treatment of compounding, prepositions, and morphology. We show how existing dependency schemes for several languages map onto the universal taxonomy proposed here and close with consideration of practical implications of dependency representation choices for NLP applications, in particular parsing.

pdf abs
A Persian Treebank with Stanford Typed Dependencies
Mojgan Seraji | Carina Jahani | Beáta Megyesi | Joakim Nivre
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present the Uppsala Persian Dependency Treebank (UPDT) with a syntactic annotation scheme based on Stanford Typed Dependencies. The treebank consists of 6,000 sentences and 151,671 tokens with an average sentence length of 25 words. The data is from different genres, including newspaper articles and fiction, as well as technical descriptions and texts about culture and art, taken from the open source Uppsala Persian Corpus (UPC). The syntactic annotation scheme is extended for Persian to include all syntactic relations that could not be covered by the primary scheme developed for English. In addition, we present open source tools for automatic analysis of Persian containing a text normalizer, a sentence segmenter and tokenizer, a part-of-speech tagger, and a parser. The treebank and the parser have been developed simultaneously in a bootstrapping procedure. The result of a parsing experiment shows an overall labeled attachment score of 82.05% and an unlabeled attachment score of 85.29%. The treebank is freely available as an open source resource.

pdf
On WordNet Semantic Classes and Dependency Parsing
Kepa Bengoetxea | Eneko Agirre | Joakim Nivre | Yue Zhang | Koldo Gojenola
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Squibs: Going to the Roots of Dependency Parsing
Miguel Ballesteros | Joakim Nivre
Computational Linguistics, Volume 39, Issue 1 - March 2013

pdf
Parsing Morphologically Rich Languages: Introduction to the Special Issue
Reut Tsarfaty | Djamé Seddah | Sandra Kübler | Joakim Nivre
Computational Linguistics, Volume 39, Issue 1 - March 2013

pdf bib
Divisible Transition Systems and Multiplanar Dependency Parsing
Carlos Gómez-Rodríguez | Joakim Nivre
Computational Linguistics, Volume 39, Issue 4 - December 2013

pdf
Target Language Adaptation of Discriminative Transfer Parsers
Oscar Täckström | Ryan McDonald | Joakim Nivre
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Latent Anaphora Resolution for Cross-Lingual Pronoun Prediction
Christian Hardmeier | Jörg Tiedemann | Joakim Nivre
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib abs
Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging
Oscar Täckström | Dipanjan Das | Slav Petrov | Ryan McDonald | Joakim Nivre
Transactions of the Association for Computational Linguistics, Volume 1

We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random field model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages.

pdf abs
Training Deterministic Parsers with Non-Deterministic Oracles
Yoav Goldberg | Joakim Nivre
Transactions of the Association for Computational Linguistics, Volume 1

Greedy transition-based parsers are very fast but tend to suffer from error propagation. This problem is aggravated by the fact that they are normally trained using oracles that are deterministic and incomplete in the sense that they assume a unique canonical path through the transition system and are only valid as long as the parser does not stray from this path. In this paper, we give a general characterization of oracles that are nondeterministic and complete, present a method for deriving such oracles for transition systems that satisfy a property we call arc decomposition, and instantiate this method for three well-known transition systems from the literature. We say that these oracles are dynamic, because they allow us to dynamically explore alternative and nonoptimal paths during training — in contrast to oracles that statically assume a unique optimal path. Experimental evaluation on a wide range of data sets clearly shows that using dynamic oracles to train greedy parsers gives substantial improvements in accuracy. Moreover, this improvement comes at no cost in terms of efficiency, unlike other techniques like beam search.

Joint morphological and syntactic analysis has been proposed as a way of improving parsing accuracy for richly inflected languages. Starting from a transition-based model for joint part-of-speech tagging and dependency parsing, we explore different ways of integrating morphological features into the model. We also investigate the use of rule-based morphological analyzers to provide hard or soft lexical constraints and the use of word clusters to tackle the sparsity of lexical features. Evaluation on five morphologically rich languages (Czech, Finnish, German, Hungarian, and Russian) shows consistent improvements in both morphological and syntactic accuracy for joint prediction over a pipeline model, with further improvements thanks to lexical constraints and word clusters. The final results improve the state of the art in dependency parsing for all languages.

pdf
A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
Francesco Sartorio | Giorgio Satta | Joakim Nivre
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
Christian Hardmeier | Sara Stymne | Jörg Tiedemann | Joakim Nivre
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf
Tunable Distortion Limits and Corpus Cleaning for SMT
Sara Stymne | Christian Hardmeier | Jörg Tiedemann | Joakim Nivre
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf
Feature Weight Optimization for Discourse-Level SMT
Sara Stymne | Christian Hardmeier | Jörg Tiedemann | Joakim Nivre
Proceedings of the Workshop on Discourse in Machine Translation

pdf bib
Lithuanian Dependency Parsing with Rich Morphological Features
Jurgita Kapočiūtė-Dzikienė | Joakim Nivre | Algis Krupavičius
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf
Normalisation of Historical Text Using Context-Sensitive Weighted Levenshtein Distance and Compound Splitting
Eva Pettersson | Beáta Megyesi | Joakim Nivre
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

pdf
Statistical Machine Translation with Readability Constraints
Sara Stymne | Jörg Tiedemann | Christian Hardmeier | Joakim Nivre
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2012

pdf
A Dynamic Oracle for Arc-Eager Dependency Parsing
Yoav Goldberg | Joakim Nivre
Proceedings of COLING 2012

pdf
Analyzing the Effect of Global Learning and Beam-Search on Transition-Based Dependency Parsing
Yue Zhang | Joakim Nivre
Proceedings of COLING 2012: Posters

pdf
Parsing the Past - Identification of Verb Constructions in Historical Text
Eva Pettersson | Beáta Megyesi | Joakim Nivre
Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf
Tree Kernels for Machine Translation Quality Estimation
Christian Hardmeier | Joakim Nivre | Jörg Tiedemann
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf
Dependency Parsers for Persian
Mojgan Seraji | Beata Megyesi | Joakim Nivre
Proceedings of the 10th Workshop on Asian Language Resources

pdf
Cross-Framework Evaluation for Statistical Parsing
Reut Tsarfaty | Joakim Nivre | Evelina Andersson
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf
MaltOptimizer: An Optimization Tool for MaltParser
Miguel Ballesteros | Joakim Nivre
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Document-Wide Decoding for Phrase-Based Statistical Machine Translation
Christian Hardmeier | Joakim Nivre | Jörg Tiedemann
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf
A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing
Bernd Bohnet | Joakim Nivre
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Joint Evaluation of Morphological Segmentation and Syntactic Parsing
Reut Tsarfaty | Joakim Nivre | Evelina Andersson
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf abs
A Basic Language Resource Kit for Persian
Mojgan Seraji | Beáta Megyesi | Joakim Nivre
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Persian with its about 100,000,000 speakers in the world belongs to the group of languages with less developed linguistically annotated resources and tools. The few existing resources and tools are neither open source nor freely available. Thus, our goal is to develop open source resources such as corpora and treebanks, and tools for data-driven linguistic analysis of Persian. We do this by exploring the reusability of existing resources and adapting state-of-the-art methods for the linguistic annotation. We present fully functional tools for text normalization, sentence segmentation, tokenization, part-of-speech tagging, and parsing. As for resources, we describe the Uppsala PErsian Corpus (UPEC) which is a modified version of the Bijankhan corpus with additional sentence segmentation and consistent tokenization modified for more appropriate syntactic annotation. The corpus consists of 2,782,109 tokens and is annotated with parts of speech and morphological features. A treebank is derived from UPEC with an annotation scheme based on Stanford Typed Dependencies and is planned to consist of 10,000 sentences of which 215 have already been annotated. Keywords: BLARK for Persian, PoS tagged corpus, Persian treebank

pdf abs
MaltOptimizer: A System for MaltParser Optimization
Miguel Ballesteros | Joakim Nivre
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Freely available statistical parsers often require careful optimization to produce state-of-the-art results, which can be a non-trivial task especially for application developers who are not interested in parsing research for its own sake. We present MaltOptimizer, a freely available tool developed to facilitate parser optimization using the open-source system MaltParser, a data-driven parser-generator that can be used to train dependency parsers given treebank data. MaltParser offers a wide range of parameters for optimization, including nine different parsing algorithms, two different machine learning libraries (each with a number of different learners), and an expressive specification language that can be used to define arbitrarily rich feature models. MaltOptimizer is an interactive system that first performs an analysis of the training set in order to select a suitable starting point for optimization and then guides the user through the optimization of parsing algorithm, feature model, and learning algorithm. Empirical evaluation on data from the CoNLL 2006 and 2007 shared tasks on dependency parsing shows that MaltOptimizer consistently improves over the baseline of default settings and sometimes even surpasses the result of manual optimization.

2011

pdf
Analyzing and Integrating Dependency Parsers
Ryan McDonald | Joakim Nivre
Computational Linguistics, Volume 37, Issue 1 - March 2011

pdf
A Survival Analysis of Fixation Times in Reading
Mattias Nilsson | Joakim Nivre
Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics

pdf
Automatic Verb Extraction from Historical Swedish Texts
Eva Pettersson | Joakim Nivre
Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib
Proceedings of the 12th International Conference on Parsing Technologies
Harry Bunt | Joakim Nivre | Özlem Çetinoglu
Proceedings of the 12th International Conference on Parsing Technologies

pdf bib
Invited Paper: Bare-Bones Dependency Parsing – A Case for Occam’s Razor?
Joakim Nivre
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

pdf
Transition-based Dependency Parsing with Rich Non-local Features
Yue Zhang | Joakim Nivre
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf
Improving Dependency Parsing with Semantic Classes
Eneko Agirre | Kepa Bengoetxea | Koldo Gojenola | Joakim Nivre
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Predicting Thread Discourse Structure over Technical Web Forums
Li Wang | Marco Lui | Su Nam Kim | Joakim Nivre | Timothy Baldwin
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf
Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
Reut Tsarfaty | Joakim Nivre | Evelina Andersson
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf
Clausal parsing helps data-driven dependency parsing: Experiments with Hindi
Samar Husain | Phani Gadde | Joakim Nivre | Rajeev Sangal
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf
On the Role of Morphosyntactic Features in Hindi Dependency Parsing
Bharat Ram Ambati | Samar Husain | Joakim Nivre | Rajeev Sangal
Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf
Linear Inversion Transduction Grammar Alignments as a Second Translation Path
Markus Saers | Joakim Nivre | Dekai Wu
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf
Towards a Data-Driven Model of Eye Movement Control in Reading
Mattias Nilsson | Joakim Nivre
Proceedings of the 2010 Workshop on Cognitive Modeling and Computational Linguistics

pdf bib
A Systematic Comparison between Inversion Transduction Grammar and Linear Transduction Grammar for Word Alignment
Markus Saers | Joakim Nivre | Dekai Wu
Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation

pdf
Evaluation of Dependency Parsers on Unbounded Dependencies
Joakim Nivre | Laura Rimell | Ryan McDonald | Carlos Gómez-Rodríguez
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf
Benchmarking of Statistical Dependency Parsers for French
Marie Candito | Joakim Nivre | Pascal Denis | Enrique Henestroza Anguiano
Coling 2010: Posters

pdf bib
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Jan Hajič | Sandra Carberry | Stephen Clark | Joakim Nivre
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
A Transition-Based Parser for 2-Planar Dependency Structures
Carlos Gómez-Rodríguez | Joakim Nivre
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Proceedings of the ACL 2010 Conference Short Papers
Jan Hajič | Sandra Carberry | Stephen Clark | Joakim Nivre
Proceedings of the ACL 2010 Conference Short Papers

pdf
Word Alignment with Stochastic Bracketing Linear Inversion Transduction Grammar
Markus Saers | Joakim Nivre | Dekai Wu
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf abs
The English-Swedish-Turkish Parallel Treebank
Beáta Megyesi | Bengt Dahlqvist | Éva Á. Csató | Joakim Nivre
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We describe a syntactically annotated parallel corpus containing typologically partly different languages, namely English, Swedish and Turkish. The corpus consists of approximately 300 000 tokens in Swedish, 160 000 in Turkish and 150 000 in English, containing both fiction and technical documents. We build the corpus by using the Uplug toolkit for automatic structural markup, such as tokenization and sentence segmentation, as well as sentence and word alignment. In addition, we use basic language resource kits for the linguistic analysis of the languages involved. The annotation is carried on various layers from morphological and part of speech analysis to dependency structures. The tools used for linguistic annotation, e.g.,\ HunPos tagger and MaltParser, are freely available data-driven resources, trained on existing corpora and treebanks for each language. The parallel treebank is used in teaching and linguistic research to study the relationship between the structurally different languages. In order to study the treebank, several tools have been developed for the visualization of the annotation and alignment, allowing search for linguistic patterns.

As the interest of the NLP community grows to develop several treebanks also for languages other than English, we observe efforts towards evaluating the impact of different annotation strategies used to represent particular languages or with reference to particular tasks. This paper contributes to the debate on the influence of resources used for the training and development on the performance of parsing systems. It presents a comparative analysis of the results achieved by three different dependency parsers developed and tested with respect to two treebanks for the Italian language, namely TUT and ISST--TANL, which differ significantly at the level of both corpus composition and adopted dependency representations.

2009

pdf
Non-Projective Dependency Parsing in Expected Linear Time
Joakim Nivre
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)
Alex Lascarides | Claire Gardent | Joakim Nivre
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf
Learning Where to Look: Modeling Eye Movements in Reading
Mattias Nilsson | Joakim Nivre
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

pdf
Learning Stochastic Bracketing Inversion Transduction Grammars with a Cubic Time Biparsing Algorithm
Markus Saers | Joakim Nivre | Dekai Wu
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

pdf
Parsing Formal Languages using Natural Language Parsing Techniques
Jens Nilsson | Welf Löwe | Johan Hall | Joakim Nivre
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

pdf
An Improved Oracle for Dependency Parsing with Online Reordering
Joakim Nivre | Marco Kuhlmann | Johan Hall
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

pdf
Voting and Stacking in Data-Driven Dependency Parsing
Mark Fishel | Joakim Nivre
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)

2008

pdf
A Dependency-Driven Parser for German Dependency and Constituency Representations
Johan Hall | Joakim Nivre
Proceedings of the Workshop on Parsing German

pdf
The CoNLL 2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies
Mihai Surdeanu | Richard Johansson | Adam Meyers | Lluís Màrquez | Joakim Nivre
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf
Dependency Parsing of Turkish
Gülşen Eryiğit | Joakim Nivre | Kemal Oflazer
Computational Linguistics, Volume 34, Number 3, September 2008

pdf
Algorithms for Deterministic Incremental Dependency Parsing
Joakim Nivre
Computational Linguistics, Volume 34, Number 4, December 2008

pdf
Erratum: Dependency Parsing of Turkish
Gülşen Eryiğit | Joakim Nivre | Kemal Oflazer
Computational Linguistics, Volume 34, Number 4, December 2008

pdf abs
MaltEval: an Evaluation and Visualization Tool for Dependency Parsing
Jens Nilsson | Joakim Nivre
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents a freely available evaluation tool for dependency parsing: MaltEval (http://w3.msi.vxu.se/users/jni/malteval). It is flexible and extensible, and provides functionality for both quantitative evaluation and visualization of dependency structure. The quantitative evaluation is compatible with other standard evaluation software for dependency structure which does not produce visualization of dependency structure, and can output more details as well as new types of evaluation metrics. In addition, MaltEval has generic support for confusion matrices. It can also produce statistical significance tests when more than one parsed file is specified. The visualization module also has the ability to highlight discrepancies between the gold-standard files and the parsed files, and it comes with an easy to use GUI functionality to search in the dependency structure of the input files.

pdf abs
Swedish-Turkish Parallel Treebank
Beáta Megyesi | Bengt Dahlqvist | Eva Pettersson | Joakim Nivre
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we describe our work on building a parallel treebank for a less studied and typologically dissimilar language pair, namely Swedish and Turkish. The treebank is a balanced syntactically annotated corpus containing both fiction and technical documents. In total, it consists of approximately 160,000 tokens in Swedish and 145,000 in Turkish. The texts are linguistically annotated using different layers from part of speech tags and morphological features to dependency annotation. Each layer is automatically processed by using basic language resources for the involved languages. The sentences and words are aligned, and partly manually corrected. We create the treebank by reusing and adjusting existing tools for the automatic annotation, alignment, and their correction and visualization. The treebank was developed within the project supporting research environment for minor languages aiming at to create representative language resources for language pairs dissimilar in language structure. Therefore, efforts are put on developing a general method for formatting and annotation procedure, as well as using tools that can be applied to other language pairs easily.

pdf
Integrating Graph-Based and Transition-Based Dependency Parsers
Joakim Nivre | Ryan McDonald
Proceedings of ACL-08: HLT

pdf
Parsing the SynTagRus Treebank of Russian
Joakim Nivre | Igor M. Boguslavsky | Leonid L. Iomdin
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf
Data-Driven Dependency Parsing across Languages and Domains: Perspectives from the CoNLL-2007 Shared task
Joakim Nivre
Proceedings of the Tenth International Conference on Parsing Technologies

pdf bib
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)
Joakim Nivre | Heiki-Jaan Kaalep | Kadri Muischnek | Mare Koit
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

pdf
A Hybrid Constituency-Dependency Parser for Swedish
Johan Hall | Joakim Nivre | Jens Nilsson
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

pdf
Characterizing the Errors of Data-Driven Dependency Parsing Models
Ryan McDonald | Joakim Nivre
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf
The CoNLL 2007 Shared Task on Dependency Parsing
Joakim Nivre | Johan Hall | Sandra Kübler | Ryan McDonald | Jens Nilsson | Sebastian Riedel | Deniz Yuret
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf
Single Malt or Blended? A Study in Multilingual Parser Optimization
Johan Hall | Jens Nilsson | Joakim Nivre | Gülşen Eryiǧit | Beáta Megyesi | Mattias Nilsson | Markus Saers
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf
Incremental Non-Projective Dependency Parsing
Joakim Nivre
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf
Generalizing Tree Transformations for Inductive Dependency Parsing
Jens Nilsson | Joakim Nivre | Johan Hall
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf
Constraints on Non-Projective Dependency Parsing
Joakim Nivre
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines
Joakim Nivre | Johan Hall | Jens Nilsson | Gülşen Eryiǧit | Svetoslav Marinov
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

pdf
Graph Transformations in Data-Driven Dependency Parsing
Jens Nilsson | Joakim Nivre | Johan Hall
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf
Discriminative Classifiers for Deterministic Dependency Parsing
Johan Hall | Joakim Nivre | Jens Nilsson
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf
Mildly Non-Projective Dependency Structures
Marco Kuhlmann | Joakim Nivre
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf abs
MaltParser: A Data-Driven Parser-Generator for Dependency Parsing
Joakim Nivre | Johan Hall | Jens Nilsson
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We introduce MaltParser, a data-driven parser generator for dependency parsing. Given a treebank in dependency format, MaltParser can be used to induce a parser for the language of the treebank. MaltParser supports several parsing algorithms and learning algorithms, and allows user-defined feature models, consisting of arbitrary combinations of lexical features, part-of-speech features and dependency features. MaltParser is freely available for research and educational purposes and has been evaluated empirically on Swedish, English, Czech, Danish and Bulgarian.

pdf abs
Talbanken05: A Swedish Treebank with Phrase Structure and Dependency Annotation
Joakim Nivre | Jens Nilsson | Johan Hall
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We introduce Talbanken05, a Swedish treebank based on a syntactically annotated corpus from the 1970s, Talbanken76, converted to modern formats. The treebank is available in three different formats, besides the original one: two versions of phrase structure annotation and one dependency-based annotation, all of which are encoded in XML. In this paper, we describe the conversion process and exemplify the available formats. The treebank is freely available for research and educational purposes.

pdf
A generic architecture for data-driven dependency parsing
Johan Hall | Joakim Nivre
Proceedings of the 15th Nordic Conference of Computational Linguistics (NODALIDA 2005)

This paper presents a deterministic parsing algorithm for projective dependency grammar. The running time of the algorithm is linear in the length of the input string, and the dependency graph produced is guaranteed to be projective and acyclic. The algorithm has been experimentally evaluated in parsing unrestricted Swedish text, achieving an accuracy above 85% with a very simple grammar.