Bettina Schrader


2008

pdf
Identification of Comparable Argument-Head Relations in Parallel Corpora
Kathrin Spreyer | Jonas Kuhn | Bettina Schrader
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present the machine learning framework that we are developing, in order to support explorative search for non-trivial linguistic configurations in low-density languages (languages with no or few NLP tools). The approach exploits advanced existing analysis tools for high-density languages and word-aligned multi-parallel corpora to bridge across languages. The goal is to find a methodology that minimizes the amount of human expert intervention needed, while producing high-quality search and annotation tools. One of the main challenges is the susceptibility of a complex system combining various automatic analysis components to hard-to-control noise from a number of sources. We present systematic experiments investigating to what degree the noise issue can be overcome by (i) exploiting more than one perspective on the target language data by considering multiple translations in the parallel corpus, and (ii) using minimally supervised learning techniques such as co-training and self-training to take advantage of a larger pool of data for generalization. We observe that while (i) does help in the training individual machine learning models, a cyclic bootstrapping process seems to suffer too much from noise. A preliminary conclusion is that in a practical approach, one has to rely on a higher degree of supervision or on noise detection heuristics.

2006

pdf
Non-probabilistic alignment of rare German and English nominal expressions
Bettina Schrader
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We present an alignment strategy that specifically deals with the correct alignment of rare German nominal compounds to their English multiword translations. It recognizes compounds and multiwords based on their character lengths and on their most frequent POS-patterns, and aligns them based on their length ratios. Our approach is designed on the basis of a data analysis on roughly 500 German hapax legomena, and as it does not use any frequency or co-occurrence information, it is well-suited to align rare compounds, but also achieves good results for more frequent expressions. Experiment results show that the strategy is able to correctly identify correct translations for 70% of the compound hapaxes in our data set. Additionally, we checked on 700 randomly chosen entries in the dictionary that was automatically generated by our alignment tool. Results of this experiment also indicate that our strategy works for non-hapaxes as well, including finding multiple correct translations for the same head compound.

pdf
ATLAS – A New Text Alignment Architecture
Bettina Schrader
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions