Mark-Jan Nederhof


2021

pdf
Automatic Classification of Human Translation and Machine Translation: A Study from the Perspective of Lexical Diversity
Yingxue Fu | Mark-Jan Nederhof
Proceedings for the First Workshop on Modelling Translation: Translatology in the Digital Age

pdf
Calculating the optimal step of arc-eager parsing for non-projective trees
Mark-Jan Nederhof
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

It is shown that the optimal next step of an arc-eager parser relative to a non-projective dependency structure can be calculated in cubic time, solving an open problem in parsing theory. Applications are in training of parsers by means of a ‘dynamic oracle’.

2019

pdf
Calculating the Optimal Step in Shift-Reduce Dependency Parsing: From Cubic to Linear Time
Mark-Jan Nederhof
Transactions of the Association for Computational Linguistics, Volume 7

We present a new cubic-time algorithm to calculate the optimal next step in shift-reduce dependency parsing, relative to ground truth, commonly referred to as dynamic oracle. Unlike existing algorithms, it is applicable if the training corpus contains non-projective structures. We then show that for a projective training corpus, the time complexity can be improved from cubic to linear.

pdf
Regular transductions with MCFG input syntax
Mark-Jan Nederhof | Heiko Vogler
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing

We show that regular transductions for which the input part is generated by some multiple context-free grammar can be simulated by synchronous multiple context-free grammars. We prove that synchronous multiple context-free grammars are strictly more powerful than this combination of regular transductions and multiple context-free grammars.

2017

pdf bib
Hybrid Grammars for Parsing of Discontinuous Phrase Structures and Non-Projective Dependency Structures
Kilian Gebhardt | Mark-Jan Nederhof | Heiko Vogler
Computational Linguistics, Volume 43, Issue 3 - September 2017

We explore the concept of hybrid grammars, which formalize and generalize a range of existing frameworks for dealing with discontinuous syntactic structures. Covered are both discontinuous phrase structures and non-projective dependency structures. Technically, hybrid grammars are related to synchronous grammars, where one grammar component generates linear structures and another generates hierarchical structures. By coupling lexical elements of both components together, discontinuous structures result. Several types of hybrid grammars are characterized. We also discuss grammar induction from treebanks. The main advantage over existing frameworks is the ability of hybrid grammars to separate discontinuity of the desired structures from time complexity of parsing. This permits exploration of a large variety of parsing algorithms for discontinuous structures, with different properties. This is confirmed by the reported experimental results, which show a wide variety of running time, accuracy, and frequency of parse failures.

2016

pdf
A short proof that O_2 is an MCFL
Mark-Jan Nederhof
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
DTED: Evaluation of Machine Translation Structure Using Dependency Parsing and Tree Edit Distance
Martin McCaffery | Mark-Jan Nederhof
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf
Transition-based dependency parsing as latent-variable constituent parsing
Mark-Jan Nederhof
Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata

2015

pdf
Count-based State Merging for Probabilistic Regular Tree Grammars
Toni Dietze | Mark-Jan Nederhof
Proceedings of the 12th International Conference on Finite-State Methods and Natural Language Processing 2015 (FSMNLP 2015 Düsseldorf)

pdf
A Probabilistic Model of Ancient Egyptian Writing
Mark-Jan Nederhof | Fahrurrozi Rahman
Proceedings of the 12th International Conference on Finite-State Methods and Natural Language Processing 2015 (FSMNLP 2015 Düsseldorf)

2014

pdf
Deterministic Parsing using PCFGs
Mark-Jan Nederhof | Martin McCaffery
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Hybrid Grammars for Discontinuous Parsing
Mark-Jan Nederhof | Heiko Vogler
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing
Mark-Jan Nederhof
Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing

2012

pdf
Synchronous Context-Free Tree Grammars
Mark-Jan Nederhof | Heiko Vogler
Proceedings of the 11th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+11)

2011

pdf
Tree Parsing with Synchronous Tree-Adjoining Grammars
Matthias Büchse | Mark-Jan Nederhof | Heiko Vogler
Proceedings of the 12th International Conference on Parsing Technologies

pdf
Prefix Probabilities for Linear Context-Free Rewriting Systems
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the 12th International Conference on Parsing Technologies

pdf
Parsing of Partially Bracketed Structures for Parse Selection
Mark-Jan Nederhof | Ricardo Sánchez-Sáez
Proceedings of the 12th International Conference on Parsing Technologies

pdf bib
Intersection for Weighted Formalisms
Mark-Jan Nederhof
Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing

pdf
Prefix Probability for Probabilistic Synchronous Context-Free Grammars
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf
Splittability of Bilexical Context-Free Grammars is Undecidable
Mark-Jan Nederhof | Giorgio Satta
Computational Linguistics, Volume 37, Issue 4 - December 2011

pdf
Computation of Infix Probabilities for Probabilistic Context-Free Grammars
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf
Transforming Lexica as Trees
Mark-Jan Nederhof
Proceedings of the 2010 Workshop on Applications of Tree Automata in Natural Language Processing

2009

pdf bib
Weighted parsing of trees
Mark-Jan Nederhof
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

pdf
Automatic Creation of Interlinear Text for Philological Purposes
Mark-Jan Nederhof
Traitement Automatique des Langues, Volume 50, Numéro 2 : Langues anciennes [Ancient Languages]

2006

pdf
LEXUS, a web-based tool for manipulating lexical resources lexicon
Marc Kemps-Snijders | Mark-Jan Nederhof | Peter Wittenburg
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

LEXUS provides a flexible framework for the maintaining lexical structure and content. It is the first implementation of the Lexical Markup Framework model currently being developed at ISO TC37/SC4. Amongst its capabilities are the possibility to create lexicon structures, manipulate content and use of typed relations. Integration of well established Data Category Registries is supported to further promote interoperability by allowing access to well established linguistic concepts. Advanced linguistic functionality is offered to assist users in cross lexica operations such as search and comparison and merging of lexica. To enable use within various user groups the look and feel of each lexicon may be customized. In the near future more functionality will be added including integration with other tools accessing lexical content.

pdf
Estimation of Consistent Probabilistic Context-free Grammars
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2005

pdf bib
A General Technique to Train Language Models on Language Models
Mark-Jan Nederhof
Computational Linguistics, Volume 31, Number 2, June 2005

2004

pdf
Probabilistic Parsing Strategies
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf
An Alternative Method of Training Probabilistic LR Parsers
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf
Kullback-Leibler Distance between Probabilistic Context-Free Grammars and Probabilistic Finite Automata
Mark-Jan Nederhof | Giorgio Satta
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf
Squibs and Discussions: Weighted Deductive Parsing and Knuth’s Algorithm
Mark-Jan Nederhof
Computational Linguistics, Volume 29, Number 1, March 2003

pdf
Probabilistic Parsing as Intersection
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the Eighth International Conference on Parsing Technologies

We show that a well-known algorithm to compute the intersection of a context-fre language and a regular language can be extended to apply to a probabilistic context-free grammar and a probabilistic finite automaton, provided the two probabilistic models are combined through multiplication. The result is a probabilistic context-free grammar that contains joint information about the original grammar and automaton.

pdf
Partially Ordered Multiset Context-free Grammars and Free-word-order Parsing
Mark-Jan Nederhof | Giorgio Satta | Stuart Shieber
Proceedings of the Eighth International Conference on Parsing Technologies

We present a new formalism, partially ordered multiset context-free grammars (poms-CFG), along with an Earley-style parsing algorithm. The formalism, which can be thought of as a generalization of context-free grammars with partially ordered right-hand sides, is of interest in its own right, and also as infrastructure for obtaining tighter complexity bounds for more expressive context-free formalisms intended to express free or multiple word-order, such as ID/LP grammars. We reduce ID/LP grammars to poms-grammars, thereby getting finer-grained bounds on the parsing complexity of ID/LP grammars. We argue that in practice, the width of attested ID/LP grammars is small, yielding effectively polynomial time complexity for ID/LP grammar parsing.

2002

pdf
Parsing non-recursive CFGs
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

2001

pdf
Approximating Context-Free by Rational Transduction for Example-Based MT
Mark-Jan Nederhof
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation

pdf
On the Complexity of Some Extensions of RCG Parsing
Eberhard Bertsch | Mark-Jan Nederhof
Proceedings of the Seventh International Workshop on Parsing Technologies

2000

pdf
Left-To-Right Parsing and Bilexical Context-Free Grammars
Mark-Jan Nederhof | Giorgio Satta
1st Meeting of the North American Chapter of the Association for Computational Linguistics

pdf
Practical Experiments with Regular Approximation of Context-Free Languages
Mark-Jan Nederhof
Computational Linguistics, Volume 26, Number 1, March 2000

1999

pdf bib
The Computational Complexity of the Correct-Prefix Property for TAGs
Mark-Jan Nederhof
Computational Linguistics, Volume 25, Number 3, September 1999

1998

pdf
Prefix probabilities for linear indexed grammars
Mark-Jan Nederhof | Anoop Sarkar | Giorgio Satta
Proceedings of the Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks (TAG+4)

pdf bib
Context-Free Parsing through Regular Approximation
Mark-Jan Nederhof
Finite State Methods in Natural Language Processing

pdf
An Alternative LR Algorithm for TAGs
Mark-Jan Nederhof
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf
Prefix Probabilities from Stochastic Free Adjoining Grammars
Mark-Jan Nederhof | Anoop Sarkar | Giorgio Satta
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf
An alternative LR algorithm for TAGs
Mark-Jan Nederhof
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf
Prefix Probabilities from Stochastic Tree Adjoining Grammars
Mark-Jan Nederhof | Anoop Sarkar | Giorgio Satta
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

1997

pdf
Grammatical analysis in the OVIS spoken-dialogue system
Mark-Jan Nederhof | Gosse Bouma | Rob Koeling | Gertjan van Noord
Interactive Spoken Dialog Systems: Bringing Speech and NLP Together in Real Applications

pdf
Regular Approximations of CFLs: A Grammatical View
Mark-Jan Nederhof
Proceedings of the Fifth International Workshop on Parsing Technologies

We show that for each context-free grammar a new grammar can be constructed that generates a regular language. This construction differs from existing methods of approximation in that use of a pushdown automaton is avoided . This allows better insight into how the generated language is affected. The new method is also more attractive from a computational viewpoint.

1996

pdf
Efficient Tabular LR Parsing
Mark-Jan Nederhof | Giorgio Satta
34th Annual Meeting of the Association for Computational Linguistics

1994

pdf
An Optimal Tabular Parsing Algorithm
Mark-Jan Nederhof
32nd Annual Meeting of the Association for Computational Linguistics

pdf
An Extended Theory of Head-Driven Parsing
Mark-Jan Nederhof | Giorgio Satta
32nd Annual Meeting of the Association for Computational Linguistics

1993

pdf
Generalized Left-Corner Parsing
Mark-Jan Nederhof
Sixth Conference of the European Chapter of the Association for Computational Linguistics

pdf
Increasing the Applicability of LR Parsing
Mark-Jan Nederhof | Janos J. Sarbo
Proceedings of the Third International Workshop on Parsing Technologies

In this paper we describe a phenomenon present in some context-free grammars, called hidden left recursion. We show that ordinary LR parsing according to hidden left-recursive grammars is not possible and we indicate a range of solutions to this problem. One of these solutions is a new parsing technique, which is a variant of traditional LR parsing. This new parsing technique can be used both with and without lookahead and the nondeterminism can be realized using backtracking or using a graph-structured stack.