Eva Pettersson


2020

pdf bib
Czech Historical Named Entity Corpus v 1.0
Helena Hubková | Pavel Kral | Eva Pettersson
Proceedings of the 12th Language Resources and Evaluation Conference

As the number of digitized archival documents increases very rapidly, named entity recognition (NER) in historical documents has become very important for information extraction and data mining. For this task an annotated corpus is needed, which has up to now been missing for Czech. In this paper we present a new annotated data collection for historical NER, composed of Czech historical newspapers. This corpus is freely available for research purposes. For this corpus, we have defined relevant domain-specific named entity types and created an annotation manual for corpus labelling. We further conducted some experiments on this corpus using recurrent neural networks. We experimented with randomly initialized embeddings and static and dynamic fastText word embeddings. We achieved 0.73 F1 score with a bidirectional LSTM model using static fastText embeddings.

2019

pdf bib
Matching Keys and Encrypted Manuscripts
Eva Pettersson | Beata Megyesi
Proceedings of the 22nd Nordic Conference on Computational Linguistics

Historical cryptology is the study of historical encrypted messages aiming at their decryption by analyzing the mathematical, linguistic and other coding patterns and their historical context. In libraries and archives we can find quite a lot of ciphers, as well as keys describing the method used to transform the plaintext message into a ciphertext. In this paper, we present work on automatically mapping keys to ciphers to reconstruct the original plaintext message, and use language models generated from historical texts to guess the underlying plaintext language.

2018

pdf bib
An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization
Gongbo Tang | Fabienne Cap | Eva Pettersson | Joakim Nivre
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, we apply different NMT models to the problem of historical spelling normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The NMT models are at different levels, have different attention mechanisms, and different neural network architectures. Our results show that NMT models are much better than SMT models in terms of character error rate. The vanilla RNNs are competitive to GRUs/LSTMs in historical spelling normalization. Transformer models perform better only when provided with more training data. We also find that subword-level models with a small subword vocabulary are better than character-level models. In addition, we propose a hybrid method which further improves the performance of historical spelling normalization.

2017

pdf bib
Annotating errors in student texts: First experiences and experiments
Sara Stymne | Eva Pettersson | Beáta Megyesi | Anne Palmér
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition

pdf bib
Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts
Gerold Schneider | Eva Pettersson | Michael Percillier
Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language

2015

pdf bib
Improving Verb Phrase Extraction from Historical Text by use of Verb Valency Frames
Eva Pettersson | Joakim Nivre
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
Ranking Relevant Verb Phrases Extracted from Historical Text
Eva Pettersson | Beáta Megyesi | Joakim Nivre
Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

2014

pdf bib
A Multilingual Evaluation of Three Spelling Normalisation Methods for Historical Text
Eva Pettersson | Beáta Megyesi | Joakim Nivre
Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

2013

pdf bib
Normalisation of Historical Text Using Context-Sensitive Weighted Levenshtein Distance and Compound Splitting
Eva Pettersson | Beáta Megyesi | Joakim Nivre
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2012

pdf bib
Parsing the Past - Identification of Verb Constructions in Historical Text
Eva Pettersson | Beáta Megyesi | Joakim Nivre
Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

2011

pdf bib
Automatic Verb Extraction from Historical Swedish Texts
Eva Pettersson | Joakim Nivre
Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

2008

pdf bib
Swedish-Turkish Parallel Treebank
Beáta Megyesi | Bengt Dahlqvist | Eva Pettersson | Joakim Nivre
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we describe our work on building a parallel treebank for a less studied and typologically dissimilar language pair, namely Swedish and Turkish. The treebank is a balanced syntactically annotated corpus containing both fiction and technical documents. In total, it consists of approximately 160,000 tokens in Swedish and 145,000 in Turkish. The texts are linguistically annotated using different layers from part of speech tags and morphological features to dependency annotation. Each layer is automatically processed by using basic language resources for the involved languages. The sentences and words are aligned, and partly manually corrected. We create the treebank by reusing and adjusting existing tools for the automatic annotation, alignment, and their correction and visualization. The treebank was developed within the project supporting research environment for minor languages aiming at to create representative language resources for language pairs dissimilar in language structure. Therefore, efforts are put on developing a general method for formatting and annotation procedure, as well as using tools that can be applied to other language pairs easily.

2004

pdf bib
MT Goes Farming: Comparing Two Machine Translation Approaches on a New Domain
Per Weijnitz | Eva Forsbom | Ebba Gustavii | Eva Pettersson | Jörg Tiedemann
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)