pdf
bib
Proceedings of the Sixth Internatonal Workshop on Parsing Technologies
pdf
bib
abs
Automatic Grammar Induction: Combining, Reducing and Doing Nothing
Eric Brill

John C. Henderson

Grace Ngai
This paper surveys three research directions in parsing. First, we look at methods for both automatically generating a set of diverse parsers and combining the outputs of different parsers into a single parse. Next, we will discuss a parsing method known as transformationbased parsing. This method, though less accurate than the best current corpusderived parsers, is able to parse quite accurately while learning only a small set of easily understood rules, as opposed to the manymegabyte parameter files learned by other techniques. Finally, we review a recent study exploring how people and machines compare at the task of creating a program to automatically annotate noun phrases.
pdf
bib
abs
Guides and Oracles for LinearTime Parsing
Martin Kay
If chart parsing is taken to include the process of reading out solutions one by one, then it has exponential complexity. The stratagem of separating readout from chart construction can also be applied to other kinds of parser, in particular, to leftcomer parsers that use early composition. When a limit is placed on the size of the stack in such a parser, it becomes contextfree equivalent. However, it is not practical to profit directly from this observation because of the large state sets that are involved in otherwise ordinary situations. It may be possible to overcome these problems by means of a guide constructed from a weakened version of the initial grammar.
pdf
Parsing Techniques for Lexicalized ContextFree Grammars
Giorgio Satta
pdf
abs
A Bootstrapping Approach to Parser Development
Izaskun Aldezabal

Koldo Gojenola

Kepa Sarasola
This paper presents a robust parsing system for unrestricted Basque texts. It analyzes a sentence in two stages: a unificationbased parser builds basic syntactic units such as NPs, PPs, and sentential complements, while a finitestate parser performs syntactic disambiguation and filtering of the results. The system has been applied to the acquisition of verbal subcategorization information, obtaining 66% recall and 87% precision in the determination of verb subcategorization instances. This information will be later incorporated to the parser, in order to improve its performance.
pdf
abs
New Tabular Algorithms for Parsing
Miguel A. Alonso

Jorge Graña

Manuel Vilares

Eric de la Clergerie
We develop a set of new tabular parsing algorithms for Linear Indexed Grammars, including bottomup algorithms and Earleylike algorithms with and without the valid prefix property, creating a continuum in which one algorithm can in turn be derived from another. The output of these algorithms is a shared forest in the form of a contextfree grammar that encodes all possible derivations for a given input string.
pdf
abs
Customizable Modular Lexicalized Parsing
R. Basili

M. T. Pazienza

F. M. Zanzotto
Different NLP applications have different efficiency constraints (i.e. quality of the results and throughput) that reflect on each core linguistic component. Syntactic processors are basic modules in some NLP application. A customization that permits the performance control of these components enables their reuse in different application scenarios. Throughput has been commonly improved using partial syntactic processors. On the other hand, specialized lexicons are generally employed to improve the quality of the syntactic material produced by specific parsing (sub)process (e.g. verb argument detection or PP attachment disambiguation) . Building upon the idea of grammar stratification, in this paper a method to push modularity and lexical sensitivity, in parsing, in view of customizable syntactic analysers is presented. A framework for modular parser design is proposed and its main properties are discussed. Parsers (i.e. different parsing module chains) are then presented and their performances are analyzed in an applicationdriven scenarios.
pdf
abs
Range Concatenation Grammars
Pierre Boullier
In this paper we present Range Concatenation Grammars, a syntactic formalism which possesses many attractive features among which we underline here, power and closure properties. For example, Range Concatenation Grammars are more powerful than Linear ContextFree Rewriting Systems though this power is not reached to the detriment of efficiency since its sentences can always be parsed in polynomial time. Range Concatenation Languages are closed both under intersection and complementation and these closure properties may allow to consider novel ways to describe some linguistic processings. We also present a parsing algorithm which is the basis of our current prototype implementation.
pdf
abs
Automated Extraction of TAGs from the Penn Treebank
John Chen

K. VijayShanker
The accuracy of statistical parsing models can be improved with the use of lexical information. Statistical parsing using Lexicalized tree adjoining grammar (LTAG), a kind of lexicalized grammar, has remained relatively unexplored. We believe that is largely in part due to the absence of large corpora accurately bracketed in terms of a perspicuous yet broad coverage LTAG. Our work attempts to alleviate this difficulty. We extract different LTAGs from the Penn Treebank. We show that certain strategies yield an improved extracted LTAG in terms of compactness, broad coverage, and supertagging accuracy. Furthermore, we perform a preliminary investigation in smoothing these grammars by means of an external linguistic resource, namely, the tree families of an XTAG grammar, a hand built grammar of English.
pdf
abs
From Cases to Rules and Vice Versa: Robust Practical Parsing With Analogy
Alex Chengyu Fang
This article describes the architecture of the Survey Parser and discusses two major components related to the analogybased parsing of unrestricted English. Firstly, it discusses the automatic generation of a large declarative formal grammar from a corpus that has been syntactically analysed. Secondly, it describes analogybased parsing that employs both the automatically learned rules and the database of cases to determine the syntactic structure of the input string. Statistics are presented to characterise the performance of the parsing system.
pdf
abs
A Transformationbased Parsing Technique With Anytime Properties
Kilian Foth

Ingo Schröder

Wolfgang Menzel
A transformationbased approach to robust parsing is presented, which achieves a strictly monotonic improvement of its current best hypothesis by repeatedly applying local repair steps to a complex multilevel representation. The transformation process is guided by scores derived from weighted constraints. Besides being interruptible, the procedure exhibits a performance profile typical for anytime procedures and holds great promise for the implementation of timeadaptive behaviour.
pdf
abs
SOUP: A Parser for Realworld Spontaneous Speech
Marsal Gavaldà
This paper describes the key features of SOUP, a stochastic, chartbased, topdown parser, especially engineered for realtime analysis of spoken language with very large, multidomain semantic grammars. SOUP achieves flexibility by encoding contextfree grammars, specified for example in the Java Speech Grammar Format, as probabilistic recursive transition networks, and robustness by allowing skipping of input words at any position and producing ranked interpretations that may consist of multiple parse trees. Moreover, SOUP is very efficient, which allows for practically instantaneous backend response.
pdf
abs
A Recognizer for Minimalist Grammars
Henk Harkema
Minimalist Grammars are a rigorous formalization of the sort of grammars proposed in the linguistic framework of Chomsky’s Minimalist Program. One notable property of Minimalist Grammars is that they allow constituents to move during the derivation of a sentence, thus creating discontinuous constituents. In this paper we will present a bottomup parsing method for Minimalist Grammars, prove its correctness, and discuss its complexity.
pdf
abs
A Neural Network Parser that Handles Sparse Data
James Henderson
Previous work has demonstrated the viability of a particular neural network architecture, Simple Synchrony Networks, for syntactic parsing. Here we present additional results on the performance of this type of parser, including direct comparisons on the same dataset with a standard statistical parsing method, Probabilistic Context Free Grammars. We focus these experiments on demonstrating one of the main advantages of the SSN parser over the PCFG, handling sparse data. We use smaller datasets than are typically used with statistical methods, resulting in the PCFG finding parses for under half of the test sentences, while the SSN finds parses for all sentences. Even on the PCFG ‘s parsed half, the SSN performs better than the PCFG, as measure by recall and precision on both constituents and a dependencylike measure.
pdf
abs
A Contextfree Approximation of Headdriven Phrase Structure Grammar
Bernd Kiefer

HansUlrich Krieger
We present a contextfree approximation of unificationbased grammars, such as HPSG or PATRII. The theoretical underpinning is established through a least fixpoint construction over a certain monotonic function. In order to reach a finite fixpoint, the concrete implementation can be parameterized in several ways , either by specifying a finite iteration depth, by using different restrictors, or by making the symbols of the CFG more complex adding annotations a la GPSG. We also present several methods that speed up the approximation process and help to limit the size of the resulting CF grammar.
pdf
abs
Optimal Ambiguity Packing in Contextfree Parsers with Interleaved Unification
Alon Lavie

Carolyn Penstein Rosé
Ambiguity packing is a well known technique for enhancing the efficiency of contextfree parsers. However, in the case of unificationaugmented contextfree parsers where parsing is interleaved with feature unification, the propagation of feature structures imposes difficulties on the ability of the parser to effectively perform ambiguity packing. We demonstrate that a clever heuristic for prioritizing the execution order of grammar rules and parsing actions can achieve a high level of ambiguity packing that is provably optimal. We present empirical evaluations of the proposed technique, performed with both a Generalized LR parser and a chart parser, that demonstrate its effectiveness.
pdf
abs
Extended Partial Parsing for Lexicalized Tree Grammars
Patrice Lopez
Existing parsing algorithms for Lexicalized Tree Grammars (LTG) formalisms (LTAG, TIG, DTG, ... ) are adaptations of algorithms initially dedicated to Context Free Grammars (CFG). They do not really take into account the fact that we do not use context free rules but partial parsing trees that we try to combine. Moreover the lexicalization raises up the important problem of multiplication of structures, a problem which does not exist in CFG. This paper presents parsing techniques for LTG taking into account these two fundamental features. Our approach focuses on robust and pratical purposes. Our parsing algorithm results in more extended partial parsing when the global parsing fails and in an interesting average complexity compared with others bottomup algorithms.
pdf
abs
Improved Leftcorner Chart Parsing for Large Contextfree Grammars
Robert C. Moore
We develop an improved form of leftcorner chart parsing for large contextfree grammars, introducing improvements that result in significant speedups more compared to previouslyknown variants of left corner parsing. We also compare our method to several other major parsing approaches, and find that our improved leftcorner parsing method outperforms each of these across a range of grammars. Finally, we also describe a new technique for minimizing the extra information needed to efficiently recover parses from the data structures built in the course of parsing.
pdf
abs
Measure for Measure: Parser Crossfertilization  Towards Increased Component Comparability and Exchange
Stephan Oepen

Ulrich Callmeier
Over the past few years significant progress was accomplished in efficient processing with widecoverage HPSG grammars. HPSGbased parsing systems are now available that can process mediumcomplexity sentences (of ten to twenty words, say) in average parse times equivalent to real (i.e. human reading) time. A large number of engineering improvements in current HPSG systems were achieved through collaboration of multiple research centers and mutual exchange of experience, encoding techniques, algorithms, and even pieces of software. This article presents an approach to grammar and system engineering, termed competence & performance profiling, that makes systematic experimentation and the precise empirical study of system properties a focal point in development. Adapting the profiling metaphor familiar from software engineering to constraintbased grammars and parsers, enables developers to maintain an accurate record of system evolution, identify grammar and system deficiencies quickly, and compare to earlier versions or between different systems. We discuss a number of exemplary problems that motivate the experimental approach, and apply the empirical methodology in a fairly detailed discussion of what was achieved during a development period of three years. Given the collaborative nature in setup, the empirical results we present involve research and achievements of a large group of people.
pdf
abs
Computing the Most Probable Parse for a Discontinuous Phrase Structure Grammar
Oliver Plaehn
This paper presents a probabilistic extension of Discontinuous Phrase Structure Grammar (DPSG), a formalism designed to describe discontinuous constituency phenomena adequately and perspicuously by means of trees with crossing branches. We outline an implementation of an agendabased chart parsing algorithm that is capable of computing the Most Probable Parse for a given input sentence for probabilistic versions of both DPSG and ContextFree Grammar. Experiments were conducted with both types of grammars extracted from the NEGRA corpus. In spite of the much greater complexity of DPSG parsing in terms of the number of (partial) analyses that can be constructed for an input sentence, accuracy results from both experiments are comparable. We also briefly hint at future lines of research aimed at more efficient ways of probabilistic parsing with discontinuous constituents.
pdf
abs
An Efficient LR Parser Generator for Tree Adjoining Grammars
Carlos A. Prolo
The first published LR algorithm for Tree Adjoining Grammars (TAGs [Joshi and Schabes, 1996]) was due to Schabes and VijayShanker [1990] . Nederhof [1998] showed that it was incorrect (after [Kinyon, 1997]), and proposed a new one. Experimenting with his new algorithm over the XTAG English Grammar [XTAG Research Group, 1998] he concluded that LR parsing was inadequate for use with reasonably sized grammars because the size of the generated table was unmanageable. Also the degree of conflicts is too high. In this paper we discuss issues involved with LR parsing for TAGs and propose a new version of the algorithm that, by maintaining the degree of prediction while deferring the “subtree reduction”, dramatically reduces both the average number of conflicts per state and the size of the parser.
pdf
abs
Parsing Scrambling with Path Set: a Graded Grammaticality Approach
Siamak Rezaei
In this work we introduce the notion of path set for parsing free word order languages. The parsing system uses this notion to parse examples of sentences with scrambling. We show that by using path set, the performance constraints on scrambling such as Resource Limitation Principle (RLP) can be represented easily. Our work contrasts with models based on the notion of immediate dominance rule and binary precedence relations. In our work the precedence relations and word order constraints are defined locally for each clause. Our binary precedence relations are examples of fuzzy relations with weights attached to them. As a result, the word order principles in our approach can be violated and each violation contributes to a lowering of the overall acceptability and grammaticality. The work suggests a robust principlebased approach to parsing ambiguous sentences in verb final languages.
pdf
abs
On the Use of Grammar Based Language Models for Statistical Machine Translation
Hassan Sawaf

Kai Schütz

Hermann Ney
In this paper, we describe some concepts of language models beyond the usually used standard trigram and use such language models for statistical machine translation. In statistical machine translation the language model is the apriori knowledge source of the system about the target language. One important requirement for the language model is the correct word order, given a certain choice of words, and to score the translations generated by the translation model Pr(f_{1}^{J}/e^{I}_{1}), in view of the syntactic context. In addition to standard mgrams with long histories, we examine the use of PartofSpeech based models as well as linguistically motivated grammars with stochastic parsing as a special type of language model. Translation results are given on the VERBMOBIL task, where translation is performed from German to English, with vocabulary sizes of 6500 and 4000 words, respectively.
pdf
abs
Algebraic Construction of Parsing Schemata
KarlMichael Schneider
We propose an algebraic method for the design of tabular parsing algorithms which uses parsing schemata [7]. The parsing strategy is expressed in a tree algebra. A parsing schema is derived from the tree algebra by means of algebraic operations such as homomorphic images, direct products, subalgebras and quotient algebras. The latter yields a tabular interpretation of the parsing strategy. The proposed method allows simpler and more elegant correctness proofs by using general theorems and is not limited to leftright parsing strategies, unlike current automatonbased approaches. Furthermore, it allows to derive parsing schemata for linear indexed grammars (LIG) from parsing schemata for contextfree grammars by means of a correctness preserving algebraic transformation. A new bottomup head corner parsing schema for LIG is constructed to demonstrate the method.
pdf
abs
A Spanish POS Tagger with Variable Memory
José Triviño

Rafael MoralesBueno
An implementation of a Spanish POS tagger is described in this paper. This implementation combines three basic approaches: a single word tagger based on decision trees, a POS tagger based on variable memory Markov models, and a feature structures set of tags. Using decision trees for single word tagging allows the tagger to work without a lexicon that lists only possible tags. Moreover, it decreases the error rate because there are no unknown words. The feature structure set of tags is advantageous when the available training corpus is small and the tag set large, which can be the case with morphologically rich languages like Spanish. Finally, variable memory Markov models training is more efficient than traditional fullorder Markov models and achieves better accuracy. In this implementation, 98.58% of tokens are correctly classified.
pdf
abs
Parsing a Lattice with Multiple Grammars
Fuliang Weng

Helen Meng

Po Chui Luk
Efficiency, memory, ambiguity, robustness and scalability are the central issues in natural language parsing. Because of the complexity of natural language, different parsers may be suited only to certain subgrammars. In addition, grammar maintenance and updating may have adverse effects on tuned parsers. Motivated by these concerns, [25] proposed a grammar partitioning and topdown parser composition mechanism for loosely restricted ContextFree Grammars (CFGs). In this paper, we report on significant progress, i.e., (1) developing guidelines for the grammar partition through a set of heuristics, (2) devising a new mixstrategy composition algorithms for any rulebased grammar partition in a lattice framework, and 3) initial but encouraging parsing results for Chinese and English queries from an Air Travel Information System (ATIS) corpus.
pdf
abs
Modular Unificationbased Parsers
Rémi Zajac

Jan Amtrup
We present an implementation of the notion of modularity and composition applied to unification based grammars. Monolithic unification grammars can be decomposed into subgrammars with well defined interfaces. Subgrammars are applied in a sequential manner at runtime, allowing incremental development and testing of large coverage grammars. The modular approach to grammar development leads us away from the traditional view of parsing a string of input symbols as the recognition of some start symbol, and towards a richer and more flexible view where inputs and outputs share the same structural properties.
pdf
Hypergraph Unificationbased Parsing for Incremental Speech Processing
Jan Amtrup
pdf
abs
Parsing Mildly Contextsensitive RMS
Tilman Becker

Dominik Heckmann
We introduce Recursive Matrix Systems (RMS) which encompass mildly contextsensitive formalisms and present efficient parsing algorithms for linear and contextfree variants of RMS. The time complexities are 𝒪(n^{2h + 1}), and 𝒪(n^{3h}) respectively, where h is the height of the matrix. It is possible to represent Tree Adjoining Grammars (TAG [1], MCTAG [2], and RTAG [3]) as RMS uniformly.
pdf
Property Grammars: a Solution for Parsing with Constraints
Philippe Blache
pdf
Grammar Organization for Cascadebased Parsing in Information Extraction
Fabio Ciravegna

Alberto Lavelli
pdf
A Bidirectional Bottomup Parser for TAG
Víctor Díaz

Vicente Carrillo

Miguel Alonso
pdf
abs
A Finitestate Parser with Dependency Structure Output
David Elworthy
We show how to augment a finitestate grammar with annotations which allow dependency structures to be extracted. There are some difficulties in determinising the grammar, which is an essential step for computational efficiency, but they can be overcome. The parser also allows syntactically ambiguous structures to be packed into a single representation.
pdf
Discriminant Reverse LR Parsing of Contextfree Grammars
Jacques Farré
pdf
Direct Parsing of SchemaTAGs
Karin Harbusch

Jens Woch
pdf
abs
Analysis of Equation Structure using Least Cost Parsing
R. Nigel Horspool

John Aycock
Mathematical equations in LaTeX are composed with tags that express formatting as opposed to structure. For conversion from LaTeX to other wordprocessing systems, the structure of each equation must be inferred. We show how a form of least cost parsing used with a very general and ambiguous grammar may be used to select an appropriate structure for a LaTeX equation. MathML provides another application for the same technology; it has two alternative tagging schemes  presentation tags to specify formatting and content tags to specify structure. While conversion from content tagging to presentation tagging is straightforward, the converse is not. Our implementation of least cost parsing is based on Earley’s algorithm.
pdf
abs
Exploiting Parallelism in Unificationbased Parsing
Marcel P. van Lohuizen
Because of the nature of the parsing problem, unificationbased parsers are hard to parallelize. We present a parallelization technique designed to cope with these difficulties.
pdf
abs
Partial Parsing with Grammatical Features
Natasa Manousopoulou

George Papakonstantinou

Panayotis Tsanakas
This paper describes a rule based method for partial parsing, particularly for noun phrase recognition, which has been used in the development of a noun phrase recognizer for Modern Greek. This technique is based on a cascade of finite state machines, adding to them a characteristic very crucial in the parsing of words with free word order: the simultaneous examination of part of speech and grammatical feature information, which are deemed equally important during the parsing procedure, in contrast with other methodologies.
pdf
Uniquely Parsable Accepting Grammar Systems
Carlos MartínVide

Victor Mitrana
pdf
Chart Parsing as Constraint Propagation
Frank Morawietz
pdf
abs
Treestructured Chart Parsing
Paul W. Placeway
We investigate a method of improving the memory efficiency of a chart parser. Specifically, we propose a technique to reduce the number of active arcs created in the process of parsing. We sketch the differences in the chart algorithm, and provide empirical results that demonstrate the effectiveness of this technique.
pdf
A Parsing Methodology for Error Detection
Davide Turcato

Devlan Nicholson

Trude Heift

Janine Toole

Stavroula Tsiplakou
pdf
abs
Dependency Model using Posterior Context
Kiyotaka Uchimoto

Masaki Murata

Satoshi Sekine

Hitoshi Isahara
We describe a new model for dependency structure analysis. This model learns the relationship between two phrasal units called bunsetsus as three categories; ‘between’, ‘dependent’, and ‘beyond’, and estimates the dependency likelihood by considering not only the relationship between two bunsetsus but also the relationship between the left bunsetsu and all of the bunsetsus to its right. We implemented this model based on the maximum entropy model. When using the Kyoto University corpus, the dependency accuracy of our model was 88%, which is about 1% higher than that of the conventional model using exactly the same features.
pdf
abs
The Editing Distance in Shared Forest
Manuel Vilares

David Cabrero

Francisco J. Ribadas
In an information system indexing can be accomplished by creating a citation based on contextfree parses, and matching becomes a natural mechanism to extract patterns. However, the language intended to represent the document can often only be approximately defined, and indices can become shared forests. Queries could also vary from indices and an approximate matching strategy becomes also necessary. We present a proposal intended to prove the applicability of tabulation techniques in this context.