Notes on the Evaluation of Dependency Parsers Obtained Through Cross-Lingual Projection
Kathrin Spreyer
Coling 2010: Posters

Training Parsers on Partial Trees: A Cross-language Comparison
Kathrin Spreyer | Lilja Øvrelid | Jonas Kuhn
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present a study that compares data-driven dependency parsers obtained by means of annotation projection between language pairs of varying structural similarity. We show how the partial dependency trees projected from English to Dutch, Italian and German can be exploited to train parsers for the target languages. We evaluate the parsers against manual gold standard annotations and find that the projected parsers substantially outperform our heuristic baselines by 9―25% UAS, which corresponds to a 21―43% reduction in error rate. A comparative error analysis focuses on how the projected target language parsers handle subjects, which is especially interesting for Italian as an instance of a pro-drop language. For Dutch, we further present experiments with German as an alternative source language. In both source languages, we contrast standard baseline parsers with parsers that are enhanced with the predictions from large-scale LFG grammars through a technique of parser stacking, and show that improvements of the source language parser can directly lead to similar improvements of the projected target language parser.


Data-Driven Dependency Parsing of New Languages Using Incomplete and Noisy Training Data
Kathrin Spreyer | Jonas Kuhn
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

Improving data-driven dependency parsing using large-scale LFG grammars
Lilja Øvrelid | Jonas Kuhn | Kathrin Spreyer
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers


Identification of Comparable Argument-Head Relations in Parallel Corpora
Kathrin Spreyer | Jonas Kuhn | Bettina Schrader
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present the machine learning framework that we are developing, in order to support explorative search for non-trivial linguistic configurations in low-density languages (languages with no or few NLP tools). The approach exploits advanced existing analysis tools for high-density languages and word-aligned multi-parallel corpora to bridge across languages. The goal is to find a methodology that minimizes the amount of human expert intervention needed, while producing high-quality search and annotation tools. One of the main challenges is the susceptibility of a complex system combining various automatic analysis components to hard-to-control noise from a number of sources. We present systematic experiments investigating to what degree the noise issue can be overcome by (i) exploiting more than one perspective on the target language data by considering multiple translations in the parallel corpus, and (ii) using minimally supervised learning techniques such as co-training and self-training to take advantage of a larger pool of data for generalization. We observe that while (i) does help in the training individual machine learning models, a cyclic bootstrapping process seems to suffer too much from noise. A preliminary conclusion is that in a practical approach, one has to rely on a higher degree of supervision or on noise detection heuristics.

Projection-based Acquisition of a Temporal Labeller
Kathrin Spreyer | Anette Frank
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I


The TIGER 700 RMRS Bank: RMRS Construction from Dependencies
Kathrin Spreyer | Anette Frank
Proceedings of the Sixth International Workshop on Linguistically Interpreted Corpora (LINC-2005)