Rens Bod

2017

pdf abs
A Data-Oriented Model of Literary Language
Andreas van Cranenburgh | Rens Bod
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We consider the task of predicting how literary a text is, with a gold standard from human ratings. Aside from a standard bigram baseline, we apply rich syntactic tree fragments, mined from the training set, and a series of hand-picked features. Our model is the first to distinguish degrees of highly and less literary novels using a variety of lexical and syntactic features, and explains 76.0 % of the variation in literary ratings.

2016

pdf abs
POS-tagging of Historical Dutch
Dieuwke Hupkes | Rens Bod
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a study of the adequacy of current methods that are used for POS-tagging historical Dutch texts, as well as an exploration of the influence of employing different techniques to improve upon the current practice. The main focus of this paper is on (unsupervised) methods that are easily adaptable for different domains without requiring extensive manual input. It was found that modernising the spelling of corpora prior to tagging them with a tagger trained on contemporary Dutch results in a large increase in accuracy, but that spelling normalisation alone is not sufficient to obtain state-of-the-art results. The best results were achieved by training a POS-tagger on a corpus automatically annotated by projecting (automatically assigned) POS-tags via word alignments from a contemporary corpus. This result is promising, as it was reached without including any domain knowledge or context dependencies. We argue that the insights of this study combined with semi-supervised learning techniques for domain adaptation can be used to develop a general-purpose diachronic tagger for Dutch.

In this paper we describe FragmentSeeker, a tool which is capable to identify all those tree constructions which are recurring multiple times in a large Phrase Structure treebank. The tool is based on an efficient kernel-based dynamic algorithm, which compares every pair of trees of a given treebank and computes the list of fragments which they both share. We describe two different notions of fragments we will use, i.e. standard and partial fragments, and provide the implementation details on how to extract them from a syntactically annotated corpus. We have tested our system on the Penn Wall Street Journal treebank for which we present quantitative and qualitative analysis on the obtained recurring structures, as well as provide empirical time performance. Finally we propose possible ways our tool could contribute to different research fields related to corpus analysis and processing, such as parsing, corpus statistics, annotation guidance, and automatic detection of argument structure.

2009

pdf
A generative re-ranking model for dependency parsing
Federico Sangati | Willem Zuidema | Rens Bod
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

2007

pdf
Unsupervised syntax-based machine translation: the contribution of discontiguous phrases
Rens Bod
Proceedings of Machine Translation Summit XI: Papers

pdf bib
A Linguistic Investigation into Unsupervised DOP
Rens Bod
Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition

pdf
Is the End of Supervised Parsing in Sight?
Rens Bod
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf
Unsupervised Parsing with U-DOP
Rens Bod
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

pdf
An All-Subtrees Approach to Unsupervised Parsing
Rens Bod
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2003

pdf
An efficient implementation of a new DOP model
Rens Bod
10th Conference of the European Chapter of the Association for Computational Linguistics

2001

pdf
What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy?
Rens Bod
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf
An Improved Parser for Data-Oriented Lexical-Functional Analysis
Rens Bod
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf
An Empirical Evaluation of LFG-DOP
Rens Bod
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf
Parsing with the Shortest Derivation
Rens Bod
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

1998

pdf
Spoken Dialogue Interpretation with the DOP Model
Rens Bod
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf
A Probabilistic Corpus-Driven Model for Lexical-Functional Analysis
Rens Bod | Ronald Kaplan
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf
Spoken Dialogue Interpretation with the DOP Model
Rens Bod
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf
A Probabilistic Corpus-Driven Model for Lexical-Functional Analysis
Rens Bod | Ronald Kaplan
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

1997

pdf
A DOP Model for Semantic Interpretation
Remko Bonnema | Rens Bod | Remko Scha
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1996

pdf
Two Questions about Data-Oriented Parsing
Rens Bod
Fourth Workshop on Very Large Corpora

1995

pdf
The Problem of Computing the Most Probable Tree in Data-Oriented Parsing and Stochastic Tree Grammars
Rens Bod
Seventh Conference of the European Chapter of the Association for Computational Linguistics

1993

pdf bib abs
Monte Carlo Parsing
Rens Bod
Proceedings of the Third International Workshop on Parsing Technologies

In stochastic language processing, we are often interested in the most probable parse of an input string. Since there can be exponentially many parses, comparing all of them is not efficient. The Viterbi algorithm (Viterbi, 1967; Fujisaki et al., 1989) provides a tool to calculate in cubic time the most probable derivation of a string generated by a stochastic context free grammar. However, in stochastic language models that allow a parse tree to be generated by different derivations – like Data Oriented Parsing (DOP) or Stochastic Lexicalized Tree-Adjoining Grammar (SLTAG) – the most probable derivation does not necessarily produce the most probable parse. In such cases, a Viterbi-style optimisation does not seem feasible to calculate the most probable parse. In the present article we show that by incorporating Monte Carlo techniques into a polynomial time parsing algorithm, the maximum probability parse can be estimated as accurately as desired in polynomial time. Monte Carlo parsing is not only relevant to DOP or SLTAG, but also provides for stochastic CFGs an interesting alternative to Viterbi. Unlike the current versions of Viterbi style optimisation (Fujisaki et al., 1989; Jelinek et al., 1990; Wright et al., 1991), Monte Carlo parsing is not restricted to CFGs in Chomsky Normal Form. For stochastic grammars that are parsable in cubic time, the time complexity of estimating the most probable parse with Monte Carlo turns out to be O(n²𝜀^-2), where n is the length of the input string and 𝜀 the estimation error. In this paper we will treat Monte Carlo parsing first of all in the context of the DOP model, since it is especially here that the number of derivations generating a single tree becomes dramatically large. Finally, a Monte Carlo Chart parser is used to test the DOP model on a set of hand-parsed strings from the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments indicate 96% test set parsing accuracy.

pdf
Using an Annotated Corpus as a Stochastic Grammar
Rens Bod
Sixth Conference of the European Chapter of the Association for Computational Linguistics