The standard set of rules defined in Combinatory Categorial Grammar (CCG) fails to provide satisfactory analyses for a number of syntactic structures found in natural languages.
These structures can be analyzed elegantly by augmenting CCG with a class of rules based on the combinator D (Curry and Feys, 1958).
We show two ways to derive the D rules: one based on unary composition and the other based on a logical characterization of CCG's rule base (Baldridge, 2002).
We also show how Eisner's (1996) normal form constraints follow from this logic, ensuring that the D rules do not lead to spurious ambiguities.
1 Introduction
Combinatory Categorial Grammar (CCG, Steedman (2000)) is a compositional, semantically transparent formalism that is both linguistically expressive and computationally tractable.
It has been used for a variety of tasks, such as wide-coverage parsing (Hock-enmaier and Steedman, 2002; Clark and Curran, 2007), sentence realization (White, 2006), learning semantic parsers (Zettlemoyer and Collins, 2007), dialog systems (Kruijff et al., 2007), grammar engineering (Beavers, 2004; Baldridge et al., 2007), and modeling syntactic priming (Reitter et al., 2006).
A distinctive aspect of CCG is that it provides a very flexible notion of constituency.
This supports elegant analyses of several phenomena (e.g., coordination, long-distance extraction, and intonation) and allows incremental parsing with the competence grammar (Steedman, 2000).
Here, we argue
that even with its flexibility, CCG as standardly defined is not permissive enough for certain linguistic constructions and greater incrementality.
Following Wittenburg (1987), we remedy this by adding a set of rules based on the D combinator of combinatory logic (Curry and Feys, 1958).
We show that CCG augmented with this rule improves CCG's empirical coverage by allowing better analyses of modal verbs in English and causatives in Spanish, and certain coordinate constructions.
The D rules are well-behaved; we show this by deriving them both from unary composition and from the logic defined by Baldridge (2002).
Both perspectives on D ensure that the new rules are compatible with normal form constraints (Eisner, 1996) for controlling spurious ambiguity.
The logic also ensures that the new rules are subject to modalities consistent with those defined by Baldridge and Kruijff (2003).
Furthermore, we define a logic that produces Eisner's constraints as grammar internal theorems rather than parsing stipulations.
2 Combinatory Categorial Grammar
CCG functors are functions over strings of symbols, so different linearized versions of each of the combinators have to be specified (ignoring S here):
The symbols {*, o, x, •} are modalities that allow subtypes of slashes to be defined; this in turn allows the slashes on categories to be defined in a way that allows them to be used (or not) with specific subsets of the above rules.
The rules of this multimodal version of CCG (Baldridge, 2002; Baldridge and Krui-jff, 2003) are derived as theorems of a Categorial Type Logic (CTL, Moortgat (1997)).
This treats CCG as a compilation of CTL proofs, providing a principled, grammar-internal basis for restrictions on the CCG rules, transferring language-particular restrictions on rule application to the lexicon, and allowing the CCG rules to be viewed as grammatical universals (Baldridge and Kruijff, 2003; Steedman and Baldridge, To Appear).
These rules—especially the B rules—allow derivations to be partially associative: given appropriate type assignments, a string ABC can be analyzed as either A(BC) or (AB)C. This associativity leads to elegant analyses of phenomena that demand more effort in less flexible frameworks.
One of the best known is "odd constituent" coordination:
(4) Bob gave Stan a beer and Max a coke.
(5) I will buy and you will eat a cheeseburger.
The coordinated constituents are challenging because they are at odds with standardly assumed phrase structure constituents.
In CCG, such constituents simply follow from the associativity added by the B and T rules.
For example, given the category assignments in (6) and the abbreviations in (7), (4) is analyzed as in (8) and (9).
Each conjunct is a pair of type-raised NPs combined by means of the >B -rule, deriving two composed constituents that are arguments to the conjunction:1
Similarly, I will buy is derived with category s/np by assuming the category (6i) for I and composing that with both verbs in turn.
CCG's approach is appealing because such constituents are not odd at all: they simply follow from the fact that CCG is a system of type-based grammatical inference that allows left associativity.
3 Linguistic Motivation for D
CCG is only partially associative.
Here, we discuss several situations which require greater associativity and thus cannot be given an adequate analysis with CCG as standardly defined.
These structures have in common that a category of the form x|(y|z) must combine with one of the form y|w—exactly the configuration handled by the D schemata in (1).
3.1 Cross-Conjunct Extraction
In the first situation, a question word is distributed across auxiliary or subordinating verb categories:
(10) . . . what you can and what you must not base your verdict on.
We call this cross-conjunct extraction.
It was noted by Pickering and Barry (1993) for English, but to the best of our knowledge it has not been treated in the
1We follow (Steedman, 2000) in assuming that type-raising applies in the lexicon, and therefore that nominals such as Stan
have type-raised lexical assignments.
We also suppress semantic representations in the derivations for the sake of space.
CCG literature, nor noted in other languages.
The problem it presents to CCG is clear in (11), which shows the necessary derivation of (10) using standard multimodal category assignments.
For the tokens of what to form constituents with you can and you must not, they must must combine directly.
The problem is that these constituents (in bold) cannot be created with the standard CCG combinators in (3).
base your verdict on
what you can
The category for and is marked for non-associativity with *, and thus combines with other expressions only by function application (Baldridge, 2002).
This ensures that each conjunct is a discrete constituent.
(14) Gandeste-te cui ge vrei, consider.imper.2s-refl.2s who.dat what want.2s §i cui ge pofi, sa dai.
"Consider to whom you want and to whom you are able to give what."
It is thus a general phenomenon, not just a quirk of English.
While it could be handled with extra categories, such as (s/(vp/np))/(s/np) for what, this is exactly the sort of strong-arm tactic that inclusion of the standard B, T, and S rules is meant to avoid.
3.2 English Auxiliary Verbs
However, this type is empirically underdetermined, given a widely-noted set of generalizations suggesting that auxiliaries and raising verbs take no subject argument at all (Jacobson, 1990, a.o.).
Lack of semantic restrictions on the subject;
iii.
Inheritance of selectional restrictions from the subordinate predicate.
Two arguments are made for (16).
First, it is necessary so that type-raised subjects can compose with the auxiliary in extraction contexts, as in (18):
Second, it is claimed to be necessary in order to account for subject-verb agreement, on the assumption that agreement features are domain restrictions on functors of type s\np (Steedman, 1992, 1996).
The first argument is the topic of this paper, and, as we show below, is refuted by the use of the D-combinator.
The second argument is undermined by examples like (19):
the average age of retirement ] ] .
In (19), appear agrees with two negative-polarity-sensitive NPs trapped inside a neither-nor coordinate structure in which they are licensed.
Appear therefore does not combine with them directly, showing that the agreement relation need not be mediated by direct application of a subject argument.
We conclude, therefore, that the assignment of the vp/vp type to English auxiliaries and modal verbs is unsupported on both formal and linguistic grounds.
Combining (20) with a type-raised subject presents another instance of the structure in (1), where that question words are represented as variable-binding operators (Groenendijk and Stokhof, 1997):
3.3 The Spanish Causative Construction
The aspect of the construction that is relevant here is that the causative verb hacer appears to take an object argument understood as the subject or agent of the subordinate verb (the causee).
However, it has been argued that Spanish causative verbs do not in fact take objects (Ackerman and Moore, 1999, and refs therein).
There are two arguments for this.
First, syntactic alternations that apply to object-taking verbs, such as passivization and periphrasis with subjunctive complements, do not apply to hacer (Lujan, 1980).
Second, hacer specifies neither the case form of the causee, nor any semantic entail-ments with respect to it.
These are instead determined by syntactic, semantic, and pragmatic factors, such as transitivity, word order, animacy, gender, social prestige, and referential specificity (Finnemann, 1982, a.o).
Thus, there is neither syntactic nor semantic evidence that hacer takes an object argument.
However, Spanish has examples of cross-conjunct extraction in which hacer hosts clitics:
le hicieron barrer la verada.
cl.dat.3ms made.3p sweep the sidewalk
"They not only ordered him to, but also made him sweep the sidewalk."
hicieron barrer la verada
The preceding data motivates adding D rules (we return to the distribution of the modalities below):
To illustrate with example (10), one application of >D allows you and can to combine when the auxiliary is given the principled type assignment s/s, and another combines what with the result.
The derivation then proceeds in the usual way.
Likewise, D handles the Spanish causative constructions (29) straightforwardly :
hice dormir
The D-rules thus provide straightforward analyses of such constructions by delivering flexible constituency while maintaining CCG's committment to low categorial ambiguity and semantic transparency.
4 Deriving Eisner Normal Form
Adding new rules can have implications for parsing efficiency.
In this section, we show that the D rules fit naturally within standard normal form constraints for CCG parsing (Eisner, 1996), by providing both
combinatory and logical bases for D. This additionally allows Eisner's normal form constraints to be derived as grammar internal theorems.
4.1 The Spurious Ambiguity Problem
CCG's flexibility is useful for linguistic analyses, but leads to spurious ambiguity (Wittenburg, 1987) due to the associativity introduced by the B and T rules.
This can incur a high computational cost which parsers must deal with.
Several techniques have been proposed for the problem (Wittenburg, 1987; Karttunen, 1989; Hepple and Morrill, 1989; Eisner, 1996).
The most commonly used are Karttunnen's chart subsumption check (White and Baldridge, 2003; Hockenmaier and Steedman, 2002) and Eisner's normal-form constraints (Bozsahin, 1998; Clark and Curran, 2007).
Eisner's normal form, referred to here as Eisner NF and paraphrased in (30), has the advantage ofnot requiring comparisons of logical forms it functions purely on the syntactic types being combined.
i. C is not the argument of (AB) resulting from application of > B1 +.
A is not the argument of (BC) resulting from application of < B1 +.
The implication is that outputs of B1+ rules are inert, using the terminology of Baldridge (2002).
Inert slashes are Baldridge's (2002) encoding in OpenCCG3 of his CTL interpretation of Steedman's (2000) antecedent-government feature.
Eisner derives (30) from two theorems about the set of semantically equivalent parses that a CCG parser will generate for a given string (see (Eisner, 1996) for proofs and discussion of the theorems)
(31) Theorem 1: For every parse tree a, there is a semantically equivalent parse-tree NF(a) in which no node resulting from application of B or S functions as the primary functor in a rule application.
(32) Theorem 2: If NF(a) and NF(a') are distinct parse trees, then their model-theoretic interpretations are distinct.
2Two parse trees are semantically equivalent if: (i) their leaf nodes have equivalent interpretations, and (ii) equivalent scope relations hold between their respective leaf-node meanings.
3http://openccg.sourceforge.net
Eisner uses a generalized form Bn (n>0) of composition that subsumes function application:4
i. If a is a lexical item, then a is in Eisner-NF.
ii.
If a is a parse tree (R, //, 7} and NF (//), NF (7 ),then NF (a).
NF (a) = (S,/31 ,NF (<T,/?
2,7})}.
As a parsing constraint, (30) is a filter on the set of parses produced for a given string.
It preserves all the unique semantic forms generated for the string while eliminating all spurious ambiguities: it is both safe and complete.
Given the utility of Eisner NF for practical CCG parsing, the D rules we propose should be compatible with (30).
This requires that the generalizations underlying (30) apply to D as well.
In the remainder of this section, we show this in two ways.
The first is to derive the binary B rules from a unary rule based on the unary combinator B :5
We then derive D from B and show that clause (iii) of (35) holds of Q schematized over both B and D.
Applying D to an argument sequence is equivalent to compound application of binary B:
Syntactically, binary B is equivalent to application of unary B to the primary functor A, followed by applying the secondary functor r to the output of B by means of function application (Jacobson, 1999):
4We use Steedman's (Steedman, 1996) "$"-convention for representing argument stacks of length n, for n > 0.
5This is Lambek's (1958) Division rule, also known as the "Geach rule" (Jacobson, 1999).
The rules for D correspond to application of B to both the primary and secondary functors, followed by function application:
As with Bn, Dn-1 can be derived by iterative application of B to both primary and secondary functors.
Interpreted in terms of B, both B and D involve application of B to the primary functor.
It follows that Theorem I applies directly to D simply by virtue of the equivalence between binary B and unary-B+FA.
Eisner's NF constraints can then be reinterpreted as a constraint on B requiring its output to be an inert result category.
We represent this in terms of the B-rules introducing an inert slash, indicated with "!"
(adopting the convention from OpenCCG):
The binary substitution (S) combinator can be similarly incorporated into the system.
Unary substitution S is like B except that it introduces a slash on only the argument-side of the input functor.
We stipulate that S returns a category with inert slashes:
T is by definition unary.
It follows that all the binary rules in CCG (including the D-rules) can be reduced to (iterated) instantiations of the unary combinators B, S, or T plus function application.
This provides a basis for CCG in which all com-binatory rules are derived from unary BS SS, and T.
4.3 A Logical Basis for Eisner Normal Form
The previous section shows that deriving CCG rules from unary combinators allows us to derive the D-rules while preserving Eisner NF.
In this section, we present an alternate formulation of Eisner NF with
Baldridge's (2002) CTL basis for CCG.
This formulation allows us to derive the D-rules as before, and does so in a way that seamlessly integrates with Baldridge's system of modalized functors.
In CTL, Bo and B x are proofs derived via structural rules that allow associativity and permutation of symbols within a sequent, in combination with the slash introduction and elimination rules of the base logic.
To control application of these rules, Baldridge keys them to binary modal operators o (for associativity) and x (for permutation).
Given these, > B is proven in (47):
In a CCG ruleset compiled from such logics, a category must have an appropriately decorated slash in order to be the input to a rule.
This means that rules apply universally, without language-specific
restrictions.
Instead, restrictions can only be declared via modalities marked on lexical categories.
The D rules are also theorems of this system.
For example, the proof for > D applies (48) as a lemma to each of the primary and secondary functors:
> D ox involves an associative version of S applied to the primary functor (50), and a permutative version to the secondary functor (51).
Rules for D with appropriate modalities can therefore be incorporated seamlessly into CCG.
In the preceding subsection, we encoded Eisner NF with inert slashes.
In Baldridge's CTL basis for CCG, inert slashes are represented as functors seeking non-lexical arguments, represented as categories marked with an antecedent-governed feature,
reflecting the intuition that non-lexical arguments have to be "bound" by a superordinate functor.
This is based on an interpretation of antecedent-government as a unary modality  ant that allows structures marked by it to permute to the left or right periphery of a structure:6
Unlike permutation rules without  ant, these permutation rules can only be used in a proof when preceeded by a hypothetical category marked with the ^ant modality.
The elimination rule for □ -modalities introduces a corresponding  -marked object in the resulting structure, feeding the rule:
ant nant zJ
Re-introduction of the [a h  ant□antz]k hypothesis results in a functor the argument of which is marked with  ant □ant.
Because lexical categories are not marked as such, the functor cannot take a lexical argument, and so is effectively an inert functor.
In Baldridge's (2002) system, only proofs involving the ARP and ALP rules produce inert categories.
In Eisner NF, all instances of -rules result in inert categories.
This can be reproduced in Baldridge's system simply by keying all structural rules to the ant-modality, the result being that all proofs involving structural rules result in inert functors.
As desired, the D-rules result in inert categories as well.
For example, >D is derived as follows (□
6Note that the diamond operator used here is a syntactic operator, rather than a semantic operator as used in (16) above.
The unary modalities used in CTL describe accessibility relationships between subtypes and supertypes of particular categories: in effect, they define feature hierarchies.
See Moortgat (1997) and Oehrle (To Appear) for further explanation.
This means that all CCG rules compiled from the logic—which requires  ant to licence the structural rules necessary to prove the rules—return inert functors.
Eisner NF thus falls out of the logic because all instances of B, D, and S produce inert categories.
This in turns allows us to view Eisner NF as part of a theory of grammatical competence, in addition to being a useful technique for constraining parsing.
5 Conclusion
Including the D -combinator rules in the CCG rule set lets us capture several linguistic generalizations that lack satisfactory analyses in standard CCG.
Furthermore, CCG augmented with D is compatible with Eisner NF (Eisner, 1996), a standard technique for controlling derivational ambiguity in CCG-parsers, and also with the modalized version
of CCG (Baldridge and Kruijff, 2003).
A consequence is that both the D rules and the NF constraints can be derived from a grammar-internal perspective.
This extends CCG's linguistic applicability without sacrificing efficiency.
Wittenburg (1987) originally proposed using rules based on D as a way to reduce spurious ambiguity, which he achieved by eliminating B rules entirely and replacing them with variations on D. Wittenburg notes that doing so produces as many instances of D as there are rules in the standard rule set.
Our proposal retains B and S, but, thanks to Eisner NF, eliminates spurious ambiguity, a result that Wittenburg was not able to realize at the time.
Our approach can be incorporated into Eisner NF straightforwardly However, Eisner NF disprefers incremental analyses by forcing right-corner analyses of long-distance dependencies, such as in (58):
For applications that call for increased incremental-ity (e.g., aligning visual and spoken input incrementally (Kruijff et al., 2007)), CCG rules that do not
produce inert categories can be derived a CTL basis that does not require  ant for associativity and permutation.
The D -rules derived from this kind of CTL specification would allow for left-corner analyses of such dependencies with the competence grammar.
An extracted element can "wrap around" the words intervening between it and its extraction site.
For example, D would allow the following bracketing for the same example (while producing the same logical form):
Finally, the unary combinator basis for CCG provides an interesting additional specification for generating CCG rules.
Like the CTL basis, the unary combinator basis can produce a much wider range of possible rules, such as D rules, that may be relevant for linguistic applications.
Whichever basis is used, inclusion of the D-rules increases empirical coverage, while at the same time preserving CCG's computational attractiveness.
Acknowledgments
Thanks Mark Steedman for extensive comments and suggestions, and particularly for noting the relationship between the D-rules and unary BS.
Thanks also to Emmon Bach, Cem Bozsahin, Jason Eisner, Geert-Jan Kruijff and the ACL reviewers.
