We present Pro3Gres, a deep-syntactic, fast dependency parser that combines a handwritten competence grammar with probabilistic performance disambiguation and that has been used in the biomedical domain.
We discuss its performance in the domain adaptation open submission.
We achieve average results, which is partly due to difficulties in mapping to the dependency representation used for the shared task.
1 Introduction
The Pro3Gres parser is a dependency parser that combines a hand-written grammar with probabilistic disambiguation.
It is described in detail in (Schneider, 2007).
It uses tagger and chunker pre-processors - parsing proper happens only between heads of chunks - and a post-processor graph converter to capture long-distance dependencies.
Pro3Gres is embedded in a flexible XML pipeline.
It has been applied to many tasks, such as parsing biomedical literature (Rinaldi et al., 2006; Rinaldi et al., 2007) and the whole British National Corpus, and has been evaluated in several ways.
We have achieved average results in the CoNLL domain adaptation track open submission (Marcus et al., 1993; Johansson and Nugues, 2007; Kulick et al., 2004; MacWhinney, 2000; Brown, 1973).
The performance of the parser is seriously affected by mapping problems to the particular dependency representation used in the shared task.
The paper is structured as follows.
We give a brief overview of the parser and its design policy in sec-
tion 2, we describe the domain adaptations that we have used in section 3, comment on the results obtained in section 4 and conclude in section 5.
2 Pro3Gres and its Design Policy
There has been growing interest in exploring the space between Treebank-trained probabilistic grammars (e.g. (Collins, 1999; Nivre, 2006)) and formal grammar-based parsers integrating statistics (e.g.
have developed a parsing system that explores this space, in the vein of systems like (Kaplan et al., 2004), using a linguistic competence grammar and a probabilistic performance disambiguation allowing us to explore interactions between lexicon and grammar (Sinclair, 1996).
The parser has been explicitly designed to be deep-syntactic like a formal grammar-based parser, by using a dependency representation that is close to LFG f-structure, but at the same time mostly context-free and integrating shallow approaches and aggressive pruning in order to keep search-spaces small, without permitting compromise on performance or linguistic adequacy.
(Abney, 1995) establishes the chunks and dependencies model as a well-motivated linguistic theory.
The non-local linguistic constraints that a hand-written grammar allows us to formulate, e.g. expressing X-bar principles or barring very marked constructions, further reduce parsing time by at least an order of magnitude.
Since the grammar is on Penn tags (except for few closed classed words, e.g. allowing including to function as preposition) the effort for writing it manually is manageable.
It has been developed from scratch in about a person month,
Figure 1: Pro3Gres parser flowchart
using traditional grammar engineering development cycles.
It contains about 1000 rules, the number is largely so high due to tag combinatorics: for example, the various subject attachment rules combining a subject (JVN, JNNS, JNNP, JNNPS) and a verb (_VBZ, JVBP, VBG, _VBN, .
VBD) are all very similar.
The parser is fast enough for large-scale application to unrestricted texts, and it delivers dependency relations which are a suitable base for a range of applications.
We have used it to parse the entire 100 million words British National Corpus (http://www.natcorp.ox.ac.uk) and similar amounts of biomedical texts.
Its parsing speed is about 500,000 words per hour.
The flowchart of the parser can be seen in figure 1.
Pro3Gres (PRObabilistic PROlog-implemented RObust Grammatical Role Extraction System) uses a dependency representation that is close to LFG f-structure, in order to give it an established linguistic background.
It uses post-processing graph structure conversions and mild context-sensitivity to capture long-distance dependencies.
We have argued in (Schneider, 2005) that LFG f-structures can be parsed for in a completely context-free fashion, except for embedded WH-questions, where a device such as functional uncertainty (Kaplan and Za-enen, 1989) or the equivalent Tree-Adjoining Grammar Adjoining operation (Joshi and Vijay-Shanker, 1989) is used.
In Dependency Grammar, this device is also known as lifting (Kahane et al., 1998; Nivre
and Nilsson, 2005).
We use a hand-written competence grammar, combined with performance-driven disambiguation obtained from the Penn Treebank (Marcus et al., 1993).
The Maximum-Likelihood Estimation (MLE) probability of generating a dependency relation R given lexical heads (a and b) at distance (in
chunks) 5 is calculated as follows.
The counts are backed off (Collins, 1999; Merlo and Esteve Ferrer, 2006).
The backoff levels include semantic classes from WordNet (Fellbaum, 1998): we back off to the lexicographer file ID of the most frequent word sense.
An example output of the parser is shown in figure 2.
3 Domain Adaptation
Based on our experience with parsing texts form the biomedical domain, we have used the following two adaptations to the domain of chemistry.
(Hindle and Rooth, 1993) exploit the fact that in sentence-initial NP PP sequences the PP unambiguously attaches to the noun.
We have observed that in sentence-initial NP PP PP sequences, also the second PP frequently attaches to the noun, the noun itself often being a relational noun.
We have thus used such sequences to learn relational nouns from the unlabelled domain texts.
Relational nouns are allowed to attach several argument PPs in the grammar, all other nouns are not.
Multi-word terms, adjective-preposition constructions and frequent PP-arguments have strong collocational force.
We have thus used the collocation extraction tool XTRACT (Smadja, 2003) to discover collocations from large domain corpora.
The probability of generating a dependency relation is augmented for collocations above a certain threshold.
Since the tagging quality of the Chemistry testset is high, the impact of multi-word term recognition was lower than the biomedical domain when using a standard tagger, as we have shown in (Rinaldi et al.,
2007).
For the CHILDES domain, we have not used any adaptation.
The hand-written grammar fares quite well on most types of questions, which are very frequent in this domain.
In the spirit of the shared task, we have not attempted to correct tagging errors, which were frequent in the CHILDES domain.
We have restricted the use of external resources to the hand-written, domain-independent grammar, and to WordNet.
Due to serious problems in mapping our
Figure 2: Example of original parser output
LFG f-structure based dependencies to the CoNLL representation, much less time than expected was available for the domain adaptation.
4 Our Results
We have achieved average results: Labeled attachment score: 3151/5001 * 100 = 63.01, unlabeled attachment score: 3327 / 5001 * 100 = 66.53, label accuracy score: 3832 / 5001 * 100 = 76.62.
These results are about 10 % below what we typically obtain when using our own dependency representation or GREVAL (Carroll et al., 2003), a deep-syntactic annotation scheme that is close to ours.
Detailed evaluations are reported in (Schneider, 2007).
Our mapping was quite poor, especially when conjunctions are involved.
Also punctuation is attached poorly.
5.7 % of all dependencies remained unmapped (unknown in the figure).
We give an overview of the the relation-dependent results in figures 1 and 2.
Mapping problems include the following examples.
First, headedness is handled very differently: while we assume auxiliaries, prepositions and coordinations to be dependents, the CoNNL representation assumes the opposite, which leads to incorrect mapping under complex interactions.
Second, the semantics of parentheticals (PRN) partly remains unclear.
In Quinidine elimination was capacity limited with apparent Michaelis constant (appKM) of 2.6 microM (about 1.2 mg/L) the gold standard annotates the second parenthesis as parenthetical, but the first as nominal modification, although both may be said to have appositional character.
Third, we seem to have misinterpreted the roles of ADV and AMOD, as they are often mutually exchanged.
Fourth, the logical subject (LGS) is sometimes marked on the by-PP (... are strongly inhibited by-LGS carbon monoxide) and sometimes on the participle (... are increased-LGS by pre-
treatment) in the gold standard.
Relations between heads of chunks, which are central for predicate-argument structures which Pro3Gres aims to recover, such as SBJ, NMOD, ROOT, perform better than those for which Pro3Gres was not originally designed, particularly ADV, AMOD, PRN, P. Performance on COORD was particularly disappointing.
Generally, mapping problems between different representations would be smaller if one used a dependency representation that maximally abstracts away from form to function, for example (Carroll et al.,
2003).
We have obtained results slightly above average on the CHILDES domain, although we did not adapt the parser to this domain in any way (unlabeled attachment score: 3013 / 4999 * 100 = 60.27 %).
The hand-written grammar, which includes rules for most types of questions, fares relatively well on this domain since questions are rare in the Penn Tree-bank (see (Hermjakob, 2001)).
Pro3Gres has been employed for question parsing at a TREC conference (Burger and Bayer, 2005).
Table 2: Prec.
&recall of DEPREL+ATTACHMENT 5 Conclusion
We have described the Pro3Gres parser.
We have achieved average results in the shared task with relatively little adaptation.
Mapping to different representations is an often underestimated task.
Our performance on the CHILDES task, where we did not adapt the parser, indicates that hand-written, carefully engineered competence grammars may be relatively domain-independent while performance disambiguation is more domain-dependent.
We will adapt the parser to further domains and include more unsupervised learning methods.
