Mark-Jan Nederhof - ACL Anthology

This is an internal, incomplete preview of a proposed change to the ACL Anthology. For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes. Do not treat this content as an official publication.

Mark-Jan Nederhof

2025

A Topicality-Driven QUD Model for Discourse Processing
Yingxue Fu | Mark-Jan Nederhof | Anais Ollagnier
Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Question Under Discussion (QUD) is a discourse framework that has attracted growing interest in NLP in recent years. Among existing QUD models, the QUD tree approach (Riester, 2019) focuses on reconstructing QUDs and their hierarchical relationships, using a single tree to represent discourse structure. Prior implementation shows moderate inter-annotator agreement, highlighting the challenging nature of this task. In this paper, we propose a new QUD model for annotating hierarchical discourse structure. Our annotation achieves high inter-annotator agreement: 81.45% for short files and 79.53% for long files of Wall Street Journal articles. We show preliminary results on using GPT-4 for automatic annotation, which suggests that one of the best-performing LLMs still struggles with capturing hierarchical discourse structure. Moreover, we compare the annotations with RST annotations. Lastly, we present an approach for integrating hierarchical and local discourse relation annotations with the proposed model.

2021

Calculating the optimal step of arc-eager parsing for non-projective trees
Mark-Jan Nederhof
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

It is shown that the optimal next step of an arc-eager parser relative to a non-projective dependency structure can be calculated in cubic time, solving an open problem in parsing theory. Applications are in training of parsers by means of a ‘dynamic oracle’.

Automatic Classification of Human Translation and Machine Translation: A Study from the Perspective of Lexical Diversity
Yingxue Fu | Mark-Jan Nederhof
Proceedings for the First Workshop on Modelling Translation: Translatology in the Digital Age

2019

Calculating the Optimal Step in Shift-Reduce Dependency Parsing: From Cubic to Linear Time
Mark-Jan Nederhof
Transactions of the Association for Computational Linguistics, Volume 7

We present a new cubic-time algorithm to calculate the optimal next step in shift-reduce dependency parsing, relative to ground truth, commonly referred to as dynamic oracle. Unlike existing algorithms, it is applicable if the training corpus contains non-projective structures. We then show that for a projective training corpus, the time complexity can be improved from cubic to linear.

Regular transductions with MCFG input syntax
Mark-Jan Nederhof | Heiko Vogler
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing

We show that regular transductions for which the input part is generated by some multiple context-free grammar can be simulated by synchronous multiple context-free grammars. We prove that synchronous multiple context-free grammars are strictly more powerful than this combination of regular transductions and multiple context-free grammars.

2017

Hybrid Grammars for Parsing of Discontinuous Phrase Structures and Non-Projective Dependency Structures
Kilian Gebhardt | Mark-Jan Nederhof | Heiko Vogler
Computational Linguistics, Volume 43, Issue 3 - September 2017

We explore the concept of hybrid grammars, which formalize and generalize a range of existing frameworks for dealing with discontinuous syntactic structures. Covered are both discontinuous phrase structures and non-projective dependency structures. Technically, hybrid grammars are related to synchronous grammars, where one grammar component generates linear structures and another generates hierarchical structures. By coupling lexical elements of both components together, discontinuous structures result. Several types of hybrid grammars are characterized. We also discuss grammar induction from treebanks. The main advantage over existing frameworks is the ability of hybrid grammars to separate discontinuity of the desired structures from time complexity of parsing. This permits exploration of a large variety of parsing algorithms for discontinuous structures, with different properties. This is confirmed by the reported experimental results, which show a wide variety of running time, accuracy, and frequency of parse failures.

2016

A short proof that O_2 is an MCFL
Mark-Jan Nederhof
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

DTED: Evaluation of Machine Translation Structure Using Dependency Parsing and Tree Edit Distance
Martin McCaffery | Mark-Jan Nederhof
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

Transition-based dependency parsing as latent-variable constituent parsing
Mark-Jan Nederhof
Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata

2015

Count-based State Merging for Probabilistic Regular Tree Grammars
Toni Dietze | Mark-Jan Nederhof
Proceedings of the 12th International Conference on Finite-State Methods and Natural Language Processing 2015 (FSMNLP 2015 Düsseldorf)

A Probabilistic Model of Ancient Egyptian Writing
Mark-Jan Nederhof | Fahrurrozi Rahman
Proceedings of the 12th International Conference on Finite-State Methods and Natural Language Processing 2015 (FSMNLP 2015 Düsseldorf)

2014

Hybrid Grammars for Discontinuous Parsing
Mark-Jan Nederhof | Heiko Vogler
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

Deterministic Parsing using PCFGs
Mark-Jan Nederhof | Martin McCaffery
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing
Mark-Jan Nederhof
Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing

2012

Synchronous Context-Free Tree Grammars
Mark-Jan Nederhof | Heiko Vogler
Proceedings of the 11th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+11)

2011

Computation of Infix Probabilities for Probabilistic Context-Free Grammars
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

Splittability of Bilexical Context-Free Grammars is Undecidable
Mark-Jan Nederhof | Giorgio Satta
Computational Linguistics, Volume 37, Issue 4 - December 2011

Prefix Probability for Probabilistic Synchronous Context-Free Grammars
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Tree Parsing with Synchronous Tree-Adjoining Grammars
Matthias Büchse | Mark-Jan Nederhof | Heiko Vogler
Proceedings of the 12th International Conference on Parsing Technologies

Prefix Probabilities for Linear Context-Free Rewriting Systems
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the 12th International Conference on Parsing Technologies

Parsing of Partially Bracketed Structures for Parse Selection
Mark-Jan Nederhof | Ricardo Sánchez-Sáez
Proceedings of the 12th International Conference on Parsing Technologies

Intersection for Weighted Formalisms
Mark-Jan Nederhof
Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing

2010

Transforming Lexica as Trees
Mark-Jan Nederhof
Proceedings of the 2010 Workshop on Applications of Tree Automata in Natural Language Processing

2009

Automatic Creation of Interlinear Text for Philological Purposes
Mark-Jan Nederhof
Traitement Automatique des Langues, Volume 50, Numéro 2 : Langues anciennes [Ancient Languages]

Weighted parsing of trees
Mark-Jan Nederhof
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

2006

LEXUS, a web-based tool for manipulating lexical resources lexicon
Marc Kemps-Snijders | Mark-Jan Nederhof | Peter Wittenburg
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

LEXUS provides a flexible framework for the maintaining lexical structure and content. It is the first implementation of the Lexical Markup Framework model currently being developed at ISO TC37/SC4. Amongst its capabilities are the possibility to create lexicon structures, manipulate content and use of typed relations. Integration of well established Data Category Registries is supported to further promote interoperability by allowing access to well established linguistic concepts. Advanced linguistic functionality is offered to assist users in cross lexica operations such as search and comparison and merging of lexica. To enable use within various user groups the look and feel of each lexicon may be customized. In the near future more functionality will be added including integration with other tools accessing lexical content.

Estimation of Consistent Probabilistic Context-free Grammars
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2005

A General Technique to Train Language Models on Language Models
Mark-Jan Nederhof
Computational Linguistics, Volume 31, Number 2, June 2005

2004

Kullback-Leibler Distance between Probabilistic Context-Free Grammars and Probabilistic Finite Automata
Mark-Jan Nederhof | Giorgio Satta
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

Probabilistic Parsing Strategies
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

An Alternative Method of Training Probabilistic LR Parsers
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

Squibs and Discussions: Weighted Deductive Parsing and Knuth’s Algorithm
Mark-Jan Nederhof
Computational Linguistics, Volume 29, Number 1, March 2003

Probabilistic Parsing as Intersection
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the Eighth International Conference on Parsing Technologies

We show that a well-known algorithm to compute the intersection of a context-fre language and a regular language can be extended to apply to a probabilistic context-free grammar and a probabilistic finite automaton, provided the two probabilistic models are combined through multiplication. The result is a probabilistic context-free grammar that contains joint information about the original grammar and automaton.

Partially Ordered Multiset Context-free Grammars and Free-word-order Parsing
Mark-Jan Nederhof | Giorgio Satta | Stuart Shieber
Proceedings of the Eighth International Conference on Parsing Technologies

We present a new formalism, partially ordered multiset context-free grammars (poms-CFG), along with an Earley-style parsing algorithm. The formalism, which can be thought of as a generalization of context-free grammars with partially ordered right-hand sides, is of interest in its own right, and also as infrastructure for obtaining tighter complexity bounds for more expressive context-free formalisms intended to express free or multiple word-order, such as ID/LP grammars. We reduce ID/LP grammars to poms-grammars, thereby getting finer-grained bounds on the parsing complexity of ID/LP grammars. We argue that in practice, the width of attested ID/LP grammars is small, yielding effectively polynomial time complexity for ID/LP grammar parsing.

2002

Parsing non-recursive CFGs
Mark-Jan Nederhof | Giorgio Satta
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

2001

Approximating Context-Free by Rational Transduction for Example-Based MT
Mark-Jan Nederhof
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation

On the Complexity of Some Extensions of RCG Parsing
Eberhard Bertsch | Mark-Jan Nederhof
Proceedings of the Seventh International Workshop on Parsing Technologies

2000

Left-To-Right Parsing and Bilexical Context-Free Grammars
Mark-Jan Nederhof | Giorgio Satta
1st Meeting of the North American Chapter of the Association for Computational Linguistics

Practical Experiments with Regular Approximation of Context-Free Languages
Mark-Jan Nederhof
Computational Linguistics, Volume 26, Number 1, March 2000

1999

The Computational Complexity of the Correct-Prefix Property for TAGs
Mark-Jan Nederhof
Computational Linguistics, Volume 25, Number 3, September 1999

1998

An alternative LR algorithm for TAGs
Mark-Jan Nederhof
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

Prefix Probabilities from Stochastic Tree Adjoining Grammars
Mark-Jan Nederhof | Anoop Sarkar | Giorgio Satta
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

An Alternative LR Algorithm for TAGs
Mark-Jan Nederhof
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

Prefix Probabilities from Stochastic Free Adjoining Grammars
Mark-Jan Nederhof | Anoop Sarkar | Giorgio Satta
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

Prefix probabilities for linear indexed grammars
Mark-Jan Nederhof | Anoop Sarkar | Giorgio Satta
Proceedings of the Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks (TAG+4)

Context-Free Parsing through Regular Approximation
Mark-Jan Nederhof
Finite State Methods in Natural Language Processing

1997

Regular Approximations of CFLs: A Grammatical View
Mark-Jan Nederhof
Proceedings of the Fifth International Workshop on Parsing Technologies

We show that for each context-free grammar a new grammar can be constructed that generates a regular language. This construction differs from existing methods of approximation in that use of a pushdown automaton is avoided . This allows better insight into how the generated language is affected. The new method is also more attractive from a computational viewpoint.

Grammatical analysis in the OVIS spoken-dialogue system
Mark-Jan Nederhof | Gosse Bouma | Rob Koeling | Gertjan van Noord
Interactive Spoken Dialog Systems: Bringing Speech and NLP Together in Real Applications

1996

Efficient Tabular LR Parsing
Mark-Jan Nederhof | Giorgio Satta
34th Annual Meeting of the Association for Computational Linguistics

1994

An Optimal Tabular Parsing Algorithm
Mark-Jan Nederhof
32nd Annual Meeting of the Association for Computational Linguistics

An Extended Theory of Head-Driven Parsing
Mark-Jan Nederhof | Giorgio Satta
32nd Annual Meeting of the Association for Computational Linguistics

1993

Increasing the Applicability of LR Parsing
Mark-Jan Nederhof | Janos J. Sarbo
Proceedings of the Third International Workshop on Parsing Technologies

In this paper we describe a phenomenon present in some context-free grammars, called hidden left recursion. We show that ordinary LR parsing according to hidden left-recursive grammars is not possible and we indicate a range of solutions to this problem. One of these solutions is a new parsing technique, which is a variant of traditional LR parsing. This new parsing technique can be used both with and without lookahead and the nondeterminism can be realized using backtracking or using a graph-structured stack.

Generalized Left-Corner Parsing
Mark-Jan Nederhof
Sixth Conference of the European Chapter of the Association for Computational Linguistics

Co-authors

Eberhard Bertsch 1

Matthias Büchse 1

Kilian Gebhardt 1

Marc Kemps-Snijders 1

Anaïs Ollagnier 1

Fahrurrozi Rahman 1

Janos J. Sarbo 1

Stuart M. Shieber 1

Ricardo Sánchez-Sáez 1

Peter Wittenburg 1

Gertjan van Noord 1

Venues