Helmut Schmid


2020

pdf bib
The LMU Munich System for the WMT20 Very Low Resource Supervised MT Task
Jindřich Libovický | Viktor Hangya | Helmut Schmid | Alexander Fraser
Proceedings of the Fifth Conference on Machine Translation

We present our systems for the WMT20 Very Low Resource MT Task for translation between German and Upper Sorbian. For training our systems, we generate synthetic data by both back- and forward-translation. Additionally, we enrich the training data with German-Czech translated from Czech to Upper Sorbian by an unsupervised statistical MT system incorporating orthographically similar word pairs and transliterations of OOV words. Our best translation system between German and Sorbian is based on transfer learning from a Czech-German system and scores 12 to 13 BLEU higher than a baseline system built using the available parallel data only.

pdf bib
Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification
Timo Schick | Helmut Schmid | Hinrich Schütze
Proceedings of the 28th International Conference on Computational Linguistics

A recent approach for few-shot text classification is to convert textual inputs to cloze questions that contain some form of task description, process them with a pretrained language model and map the predicted words to labels. Manually defining this mapping between words and labels requires both domain expertise and an understanding of the language model’s abilities. To mitigate this issue, we devise an approach that automatically finds such a mapping given small amounts of training data. For a number of tasks, the mapping found by our approach performs almost as well as hand-crafted label-to-word mappings.

2017

pdf bib
Statistical Models for Unsupervised, Semi-Supervised Supervised Transliteration Mining
Hassan Sajjad | Helmut Schmid | Alexander Fraser | Hinrich Schütze
Computational Linguistics, Volume 43, Issue 2 - June 2017

We present a generative model that efficiently mines transliteration pairs in a consistent fashion in three different settings: unsupervised, semi-supervised, and supervised transliteration mining. The model interpolates two sub-models, one for the generation of transliteration pairs and one for the generation of non-transliteration pairs (i.e., noise). The model is trained on noisy unlabeled data using the EM algorithm. During training the transliteration sub-model learns to generate transliteration pairs and the fixed non-transliteration model generates the noise pairs. After training, the unlabeled data is disambiguated based on the posterior probabilities of the two sub-models. We evaluate our transliteration mining system on data from a transliteration mining shared task and on parallel corpora. For three out of four language pairs, our system outperforms all semi-supervised and supervised systems that participated in the NEWS 2010 shared task. On word pairs extracted from parallel corpora with fewer than 2% transliteration pairs, our system achieves up to 86.7% F-measure with 77.9% precision and 97.8% recall.

2015

pdf bib
The Operation Sequence Model—Combining N-Gram-Based and Phrase-Based Statistical Machine Translation
Nadir Durrani | Helmut Schmid | Alexander Fraser | Philipp Koehn | Hinrich Schütze
Computational Linguistics, Volume 41, Issue 2 - June 2015

2014

pdf bib
Investigating the Usefulness of Generalized Word Representations in SMT
Nadir Durrani | Philipp Koehn | Helmut Schmid | Alexander Fraser
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Dependency parsing with latent refinements of part-of-speech tags
Thomas Mueller | Richard Farkas | Alex Judea | Helmut Schmid | Hinrich Schuetze
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Efficient Higher-Order CRFs for Morphological Tagging
Thomas Mueller | Helmut Schmid | Hinrich Schütze
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Model With Minimal Translation Units, But Decode With Phrases
Nadir Durrani | Alexander Fraser | Helmut Schmid
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Munich-Edinburgh-Stuttgart Submissions of OSM Systems at WMT13
Nadir Durrani | Alexander Fraser | Helmut Schmid | Hassan Sajjad | Richárd Farkas
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
QCRI-MES Submission at WMT13: Using Transliteration Mining to Improve Statistical Machine Translation
Hassan Sajjad | Svetlana Smekalova | Nadir Durrani | Alexander Fraser | Helmut Schmid
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Munich-Edinburgh-Stuttgart Submissions at WMT13: Morphological and Syntactic Processing for SMT
Marion Weller | Max Kisselew | Svetlana Smekalova | Alexander Fraser | Helmut Schmid | Nadir Durrani | Hassan Sajjad | Richárd Farkas
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Knowledge Sources for Constituent Parsing of German, a Morphologically Rich and Less-Configurational Language
Alexander Fraser | Helmut Schmid | Richárd Farkas | Renjing Wang | Hinrich Schütze
Computational Linguistics, Volume 39, Issue 1 - March 2013

pdf bib
Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?
Nadir Durrani | Alexander Fraser | Helmut Schmid | Hieu Hoang | Philipp Koehn
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining
Hassan Sajjad | Alexander Fraser | Helmut Schmid
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Comparative Investigation of Morphological Language Modeling for the Languages of the European Union
Thomas Mueller | Hinrich Schuetze | Helmut Schmid
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Forest Reranking through Subtree Ranking
Richárd Farkas | Helmut Schmid
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Data-driven Dependency Parsing With Empty Heads
Wolfgang Seeker | Richárd Farkas | Bernd Bohnet | Helmut Schmid | Jonas Kuhn
Proceedings of COLING 2012: Posters

pdf bib
Dependency Parsing of Hungarian: Baseline Results and Challenges
Richárd Farkas | Veronika Vincze | Helmut Schmid
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment
Hassan Sajjad | Alexander Fraser | Helmut Schmid
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Joint Sequence Translation Model with Integrated Reordering
Nadir Durrani | Helmut Schmid | Alexander Fraser
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Comparing Two Techniques for Learning Transliteration Models Using a Parallel Corpus
Hassan Sajjad | Nadir Durrani | Helmut Schmid | Alexander Fraser
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Features for Phrase-Structure Reranking from Dependency Parses
Richárd Farkas | Bernd Bohnet | Helmut Schmid
Proceedings of the 12th International Conference on Parsing Technologies

2010

pdf bib
Design and Application of a Gold Standard for Morphological Analysis: SMOR as an Example of Morphological Evaluation
Gertrud Faaß | Ulrich Heid | Helmut Schmid
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes general requirements for evaluating and documenting NLP tools with a focus on morphological analysers and the design of a Gold Standard. It is argued that any evaluation must be measurable and documentation thereof must be made accessible for any user of the tool. The documentation must be of a kind that it enables the user to compare different tools offering the same service, hence the descriptions must contain measurable values. A Gold Standard presents a vital part of any measurable evaluation process, therefore, the corpus-based design of a Gold Standard, its creation and problems that occur are reported upon here. Our project concentrates on SMOR, a morphological analyser for German that is to be offered as a web-service. We not only utilize this analyser for designing the Gold Standard, but also evaluate the tool itself at the same time. Note that the project is ongoing, therefore, we cannot present final results.

pdf bib
A Corpus Representation Format for Linguistic Web Services: The D-SPIN Text Corpus Format and its Relationship with ISO Standards
Ulrich Heid | Helmut Schmid | Kerstin Eckart | Erhard Hinrichs
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In the framework of the preparation of linguistic web services for corpus processing, the need for a representation format was felt, which supports interoperability between different web services in a corpus processing pipeline, but also provides a well-defined interface to both, legacy tools and their data formats and upcoming international standards. We present the D-SPIN text corpus format, TCF, which was designed for this purpose. It is a stand-off XML format, inspired by the philosophy of the emerging standards LAF (Linguistic Annotation Framework) and its ``instances'' MAF for morpho-syntactic annotation and SynAF for syntactic annotation. Tools for the exchange with existing (best practice) formats are available, and a converter from MAF to TCF is being tested in spring 2010. We describe the usage scenario where TCF is embedded and the properties and architecture of TCF. We also give examples of TCF encoded data and describe the aspects of syntactic and semantic interoperability already addressed.

pdf bib
Hindi-to-Urdu Machine Translation through Transliteration
Nadir Durrani | Hassan Sajjad | Alexander Fraser | Helmut Schmid
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf bib
Tagging Urdu Text with Parts of Speech: A Tagger Comparison
Hassan Sajjad | Helmut Schmid
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2008

pdf bib
Estimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging
Helmut Schmid | Florian Laws
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Combining EM Training and the MDL Principle for an Automatic Verb Classification Incorporating Selectional Preferences
Sabine Schulte im Walde | Christian Hying | Christian Scheible | Helmut Schmid
Proceedings of ACL-08: HLT

2007

pdf bib
Phonological Constraints and Morphological Preprocessing for Grapheme-to-Phoneme Conversion
Vera Demberg | Helmut Schmid | Gregor Möhler
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Trace Prediction and Recovery with Unlexicalized PCFGs and Slash Features
Helmut Schmid
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf bib
Disambiguation of Morphological Structure using a PCFG
Helmut Schmid
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Efficient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors
Helmut Schmid
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
New Statistical Methods for Phrase Break Prediction
Helmut Schmid | Michaela Atterer
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
SMOR: A German Computational Morphology Covering Derivation, Composition and Inflection
Helmut Schmid | Arne Fitschen | Ulrich Heid
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib
A Generative Probability Model for Unification-Based Grammars
Helmut Schmid
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Lexicalization of Probabilistic Grammars
Helmut Schmid
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Parse Forest Computation of Expected Governors
Helmut Schmid | Mats Rooth
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf bib
Robust German Noun Chunking With a Probabilistic Context-Free Grammar
Helmut Schmid | Sabine Schulte im Walde
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

1997

pdf bib
Parsing by Successive Approximation
Helmut Schmid
Proceedings of the Fifth International Workshop on Parsing Technologies

It is proposed to parse feature structure-based grammars in several steps. Each step is aimed to eliminate as many invalid analyses as possible as efficiently as possible. To this end the set of feature constraints is divided into three subsets, a set of context-free constraints, a set of filtering constraints and a set of structure-building constraints, which are solved in that order. The best processing strategy differs: Context-free constraints are solved efficiently with one of the well-known algorithms for context-free parsing. Filtering constraints can be solved using unification algorithms for non-disjunctive feature structures whereas structure-building constraints require special techniques to represent feature structures with embedded disjunctions efficiently. A compilation method and an efficient processing strategy for filtering constraints are presented.

1994

pdf bib
Part-of-Speech Tagging With Neural Networks
Helmut Schmid
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics