In
lexicalized
grammatical
formalisms
,
it
is
possible
to
separate
lexical
category
assignment
from
the
combinatory
processes
that
make
use
of
such
categories
,
such
as
parsing
and
realization
.
We
adapt
techniques
from
supertagging
—
a
relatively
recent
technique
that
performs
complex
lexical
tagging
before
full
parsing
(
Bangalore
and
Joshi
,
1999
;
Clark
,
2002
)
—
for
chart
realization
in
OpenCCG
,
an
open-source
NLP
toolkit
for
CCG
.
We
call
this
approach
hypertagging
,
as
it
operates
at
a
level
"
above
"
the
syntax
,
tagging
semantic
representations
with
syntactic
lexical
categories
.
Our
results
demonstrate
that
a
hypertagger-informed
chart
realizer
can
achieve
substantial
improvements
in
realization
speed
(
being
approximately
twice
as
fast
)
with
superior
realization
quality
.
1
Introduction
In
lexicalized
grammatical
formalisms
such
as
Lexicalized
Tree
Adjoining
Grammar
(
Schabes
et
al.
,
1988
,
LTAG
)
,
Combinatory
Categorial
Grammar
(
Steedman
,
2000
,
CCG
)
and
Head-Driven
Phrase-Structure
Grammar
(
Pollard
and
Sag
,
1994
,
HPSG
)
,
it
is
possible
to
separate
lexical
category
assignment
—
the
assignment
of
informative
syntactic
categories
to
linguistic
objects
such
as
words
or
lexical
predicates
—
from
the
combinatory
processes
that
make
use
of
such
categories
—
such
as
parsing
and
surface
realization
.
One
way
of
performing
lexical
assignment
is
simply
to
hypothesize
all
possible
lexical
categories
and
then
search
for
the
best
combination
thereof
,
as
in
the
CCG
parser
in
(
Hock-enmaier
,
2003
)
or
the
chart
realizer
in
(
Carroll
and
Oepen
,
2005
)
.
A
relatively
recent
technique
for
lexical
category
assignment
is
supertagging
(
Bangalore
and
Joshi
,
1999
)
,
a
preprocessing
step
to
parsing
that
assigns
likely
categories
based
on
word
and
part-of-speech
(
POS
)
contextual
information
.
Supertagging
was
dubbed
"
almost
parsing
"
by
these
authors
,
because
an
oracle
supertagger
left
relatively
little
work
for
their
parser
,
while
speeding
up
parse
times
considerably
.
Supertagging
has
been
more
recently
extended
to
a
multitagging
paradigm
in
CCG
(
Clark
,
2002
;
Curran
et
al.
,
2006
)
,
leading
to
extremely
efficient
parsing
with
state-of-the-art
dependency
recovery
(
Clark
and
Curran
,
2007
)
.
We
have
adapted
this
multitagging
approach
to
lexical
category
assignment
for
realization
using
the
CCG-based
natural
language
toolkit
OpenCCG.1
Instead
of
basing
category
assignment
on
linear
word
and
POS
context
,
however
,
we
predict
lexical
categories
based
on
contexts
within
a
directed
graph
structure
representing
the
logical
form
(
LF
)
of
a
proposition
to
be
realized
.
Assigned
categories
are
instantiated
in
OpenCCG
's
chart
realizer
where
,
together
with
a
treebank-derived
syntactic
grammar
(
Hockenmaier
and
Steedman
,
2007
)
and
a
factored
language
model
(
Bilmes
and
Kirchhoff
,
2003
)
,
they
constrain
the
English
word-strings
that
are
chosen
to
express
the
LF
.
We
have
dubbed
this
approach
hy-pertagging
,
as
it
operates
at
a
level
"
above
"
the
syntax
,
moving
from
semantic
representations
to
syntactic
categories
.
We
evaluate
this
hypertagger
in
two
ways
:
first
,
1http
:
/
/
openccg.sourceforge.net
.
we
evaluate
it
as
a
tagger
,
where
the
hypertagger
achieves
high
single-best
(
93.6
%
)
and
multitagging
labelling
accuracies
(
95.8-99.4
%
with
category
per
lexical
predication
ratios
ranging
from
1.1
to
3.9
)
.
2
Second
,
we
compare
a
hypertagger-augmented
version
of
OpenCCG
's
chart
realizer
with
the
preexisting
chart
realizer
(
White
et
al.
,
2007
)
that
simply
instantiates
the
chart
with
all
possible
CCG
categories
(
subject
to
frequency
cutoffs
)
for
each
input
LF
predicate
.
The
hypertagger-seeded
realizer
runs
approximately
twice
as
fast
as
the
pre-existing
OpenCCG
realizer
and
finds
a
larger
number
of
complete
realizations
,
resorting
less
to
chart
fragment
assembly
in
order
to
produce
an
output
within
a
15
second
per-sentence
time
limit
.
Moreover
,
the
overall
Bleu
(
Papineni
et
al.
,
2002
)
and
Meteor
(
Lavie
and
Agarwal
,
2007
)
scores
,
as
well
as
numbers
of
exact
string
matches
(
as
measured
against
to
the
original
sentences
in
the
CCGbank
)
are
higher
for
the
hypertagger-seeded
realizer
than
for
the
preexisting
realizer
.
This
paper
is
structured
as
follows
:
Section
2
provides
background
on
chart
realization
in
OpenCCG
using
a
corpus-derived
grammar
.
Section
3
describes
our
hypertagging
approach
and
how
it
is
integrated
into
the
realizer
.
Section
4
describes
our
results
,
followed
by
related
work
in
Section
5
and
our
conclusions
in
Section
6
.
2
Background
2.1
Surface
Realization
with
OpenCCG
The
OpenCCG
surface
realizer
is
based
on
Steed-man
's
(
2000
)
version
of
CCG
elaborated
with
Baldridge
and
Kruijff
's
multi-modal
extensions
for
lexically
specified
derivation
control
(
Baldridge
,
2002
;
Baldridge
and
Kruijff
,
2003
)
and
hybrid
logic
dependency
semantics
(
Baldridge
and
Kruijff
,
2002
)
.
OpenCCG
implements
a
symbolic-statistical
chart
realization
algorithm
(
Kay
,
1996
;
Carroll
et
al.
,
1999
;
White
,
2006b
)
combining
(
1
)
a
theoretically
grounded
approach
to
syntax
and
semantic
composition
with
(
2
)
factored
language
models
(
Bilmes
and
Kirchhoff
,
2003
)
for
making
choices
among
the
options
left
open
by
the
grammar
.
In
OpenCCG
,
the
search
for
complete
realizations
2Note
that
the
multitagger
is
"
correct
"
if
the
correct
tag
is
anywhere
in
the
multitag
set
.
makes
use
of
n-gram
language
models
over
words
represented
as
vectors
of
factors
,
including
surface
form
,
part
of
speech
,
supertag
and
semantic
class
.
The
search
proceeds
in
one
of
two
modes
,
anytime
or
two-stage
(
packing
/
unpacking
)
.
In
the
anytime
mode
,
a
best-first
search
is
performed
with
a
configurable
time
limit
:
the
scores
assigned
by
the
n-gram
model
determine
the
order
of
the
edges
on
the
agenda
,
and
thus
have
an
impact
on
realization
speed
.
In
the
two-stage
mode
,
a
packed
forest
of
all
possible
realizations
is
created
in
the
first
stage
;
in
the
second
stage
,
the
packed
representation
is
unpacked
in
bottom-up
fashion
,
with
scores
assigned
to
the
edge
for
each
sign
as
it
is
unpacked
,
much
as
in
(
Langkilde
,
2000
)
.
Edges
are
grouped
into
equivalence
classes
when
they
have
the
same
syntactic
category
and
cover
the
same
parts
of
the
input
logical
form
.
Pruning
takes
place
within
equivalence
classes
of
edges
.
Additionally
,
to
realize
a
wide
range
of
paraphrases
,
OpenCCG
implements
an
algorithm
for
efficiently
generating
from
disjunctive
logical
forms
(
White
,
2006a
)
.
graphs
are
represented
using
Hybrid
Logic
Dependency
Semantics
(
HLDS
)
,
a
dependency-based
approach
to
representing
linguistic
meaning
developed
by
Baldridge
and
Kruijff
(
2002
)
.
In
HLDS
,
hybrid
logic
(
Blackburn
,
2000
)
terms
are
used
to
describe
dependency
graphs
.
These
graphs
have
been
suggested
as
representations
for
discourse
structure
,
and
have
their
own
underlying
semantics
(
White
,
2006b
)
.
To
more
robustly
support
broad
coverage
surface
realization
,
OpenCCG
has
recently
been
enhanced
to
greedily
assemble
fragments
in
the
event
that
the
realizer
fails
to
find
a
complete
realization
.
The
fragment
assembly
algorithm
begins
with
the
edge
for
the
best
partial
realization
,
i.e.
the
one
that
covers
the
most
elementary
predications
in
the
input
logical
form
,
with
ties
broken
according
to
the
n-gram
score
.
(
Larger
fragments
are
preferred
under
the
assumption
that
they
are
more
likely
to
be
grammatical
.
)
Next
,
the
chart
and
agenda
are
greedily
searched
for
the
best
edge
whose
semantic
coverage
is
disjoint
from
those
selected
so
far
;
this
process
repeats
until
no
further
edges
can
be
added
to
the
set
of
selected
fragments
.
In
the
final
step
,
these
fragments
are
concatenated
,
again
in
a
greedy
fashion
,
this
time
according
to
the
n-gram
score
of
the
concatenated
edges
:
starting
with
the
original
best
edge
,
the
fragment
whose
concatenation
on
the
left
or
right
side
yields
the
highest
score
is
chosen
as
the
one
to
concatenate
next
,
until
all
the
fragments
have
been
concatenated
into
a
single
output
.
2.2
Realization
from
an
Enhanced
CCGbank
White
et
al.
(
2007
)
describe
an
ongoing
effort
to
engineer
a
grammar
from
the
CCGbank
(
Hockenmaier
and
Steedman
,
2007
)
—
a
corpus
of
CCG
derivations
derived
from
the
Penn
Treebank
—
suitable
for
realization
with
OpenCCG
.
This
process
involves
converting
the
corpus
to
reflect
more
precise
analyses
,
where
feasible
,
and
adding
semantic
representations
to
the
lexical
categories
.
In
the
first
step
,
the
derivations
in
the
CCGbank
are
revised
to
reflect
the
desired
syntactic
derivations
.
Changes
to
the
derivations
are
necessary
to
reflect
the
lexicalized
treatment
of
coordination
and
punctuation
assumed
by
the
multi-modal
version
of
CCG
that
is
implemented
in
OpenCCG
.
Further
changes
are
necessary
to
support
semantic
dependencies
rather
than
surface
syn
-
tactic
ones
;
in
particular
,
the
features
and
unification
constraints
in
the
categories
related
to
semanti-cally
empty
function
words
such
complementizers
,
infinitival-to
,
expletive
subjects
,
and
case-marking
prepositions
are
adjusted
to
reflect
their
purely
syntactic
status
.
In
the
second
step
,
a
grammar
is
extracted
from
the
converted
CCGbank
and
augmented
with
logical
forms
.
Categories
and
unary
type
changing
rules
(
corresponding
to
zero
morphemes
)
are
sorted
by
frequency
and
extracted
if
they
meet
the
specified
frequency
thresholds
.
A
separate
transformation
then
uses
around
two
dozen
generalized
templates
to
add
logical
forms
to
the
categories
,
in
a
fashion
reminiscent
of
(
Bos
,
2005
)
.
The
effect
of
this
transformation
is
illustrated
below
.
Example
(
1
)
shows
how
numbered
semantic
roles
,
taken
from
PropBank
(
Palmer
et
al.
,
2005
)
when
available
,
are
added
to
the
category
of
an
active
voice
,
past
tense
transitive
verb
,
where
*
pred
*
is
a
placeholder
for
the
lexical
predicate
;
examples
(
2
)
and
(
3
)
show
how
more
specific
relations
are
introduced
in
the
category
for
determiners
and
the
category
for
the
possessive
's
,
respectively
.
After
logical
form
insertion
,
the
extracted
and
augmented
grammar
is
loaded
and
used
to
parse
the
sentences
in
the
CCGbank
according
to
the
goldstandardderivation
.
Ifthederivationcanbesuccess-fully
followed
,
the
parse
yields
a
logical
form
which
is
saved
along
with
the
corpus
sentence
in
order
to
later
test
the
realizer
.
The
algorithm
for
following
corpus
derivations
attempts
to
continue
processing
if
it
encounters
a
blocked
derivation
due
to
sentence-internal
punctuation
.
While
punctuation
has
been
partially
reanalyzed
to
use
lexical
categories
,
many
problem
cases
remain
due
to
the
CCGbank
's
reliance
on
punctuation-specific
binary
rules
that
are
not
supported
in
OpenCCG
.
Currently
,
the
algorithm
succeeds
in
creating
logical
forms
for
97.7
%
of
the
sentences
in
the
development
section
(
Sect
.
00
)
of
the
converted
CCG-bank
,
and
96.1
%
of
the
sentences
in
the
test
section
(
Sect
.
23
)
.
Of
these
,
76.6
%
of
the
development
logical
forms
are
semantic
dependency
graphs
with
a
single
root
,
while
76.7
%
of
the
test
logical
forms
have
a
single
root
.
The
remaining
cases
,
with
multiple
roots
,
are
missing
one
or
more
dependencies
required
to
form
a
fully
connected
graph
.
These
missing
dependencies
usually
reflect
inadequacies
in
the
current
logical
form
templates
.
2.3
Factored
Language
Models
Following
White
et
al.
(
2007
)
,
we
use
factored
tri-gram
models
over
words
,
part-of-speech
tags
and
supertags
to
score
partial
and
complete
realizations
.
The
language
models
were
created
using
the
SRILM
toolkit
(
Stolcke
,
2002
)
on
the
standard
training
sections
(
2-21
)
of
the
CCGbank
,
with
sentence-initial
words
(
other
than
proper
names
)
uncapital-ized
.
While
these
models
are
considerably
smaller
than
the
ones
used
in
(
Langkilde-Geary
,
2002
;
Vell-dal
and
Oepen
,
2005
)
,
the
training
data
does
have
the
advantage
of
being
in
the
same
domain
and
genre
(
using
larger
n-gram
models
remains
for
future
investigation
)
.
The
models
employ
interpolated
Kneser-Ney
smoothing
with
the
default
frequency
cutoffs
.
The
best
performing
model
interpolates
a
word
trigram
model
with
a
trigram
model
that
chains
a
POS
model
with
a
supertag
model
,
where
the
POS
model
conditions
on
the
previous
two
POS
tags
,
and
the
supertag
model
conditions
on
the
previous
two
POS
tags
as
well
as
the
current
one
.
Note
that
the
use
of
supertags
in
the
factored
language
model
to
score
possible
realizations
is
distinct
from
the
prediction
of
supertags
for
lexical
category
assignment
:
the
former
takes
the
words
in
the
local
context
into
account
(
as
in
supertagging
for
parsing
)
,
while
the
latter
takes
features
of
the
logical
form
into
account
.
It
is
this
latter
process
which
we
call
hyper-tagging
,
and
to
which
we
now
turn
.
3
The
Approach
3.1
Lexical
Smoothing
and
Search
Errors
In
White
et
al.
's
(
2007
)
initial
investigation
of
scaling
up
OpenCCG
for
broad
coverage
realization
,
test
set
grammar
complete
Table
1
:
Percentage
of
complete
realizations
using
an
oracle
n-gram
model
versus
the
best
performing
factored
language
model
.
all
categories
observed
more
often
than
a
threshold
frequency
were
instantiated
for
lexical
predicates
;
for
unseen
words
,
a
simple
smoothing
strategy
based
on
the
part
of
speech
was
employed
,
assigning
the
most
frequent
categories
for
the
POS
.
This
approach
turned
out
to
suffer
from
a
large
number
of
search
errors
,
where
the
realizer
failed
to
find
a
complete
realization
before
timing
out
even
in
cases
where
the
grammar
supported
one
.
To
confirm
that
search
errors
had
become
a
significant
issue
,
White
et
al.
compared
the
percentage
of
complete
realizations
(
versus
fragmentary
ones
)
with
their
top
scoring
model
against
an
oracle
model
that
uses
a
simplified
BLEU
score
based
on
the
target
string
,
which
is
useful
for
regression
testing
as
it
guides
the
best-first
search
to
the
reference
sentence
.
The
comparison
involved
both
a
medium-sized
(
non-blind
)
grammar
derived
from
the
development
section
and
a
large
grammar
derived
from
the
training
sections
(
the
latter
with
slightly
higher
thresholds
)
.
As
shown
in
Table
1
,
with
the
large
grammar
derived
from
the
training
sections
,
many
fewer
complete
realizations
are
found
(
before
timing
out
)
using
the
factored
language
model
than
are
possible
,
as
indicated
by
the
results
of
using
the
oracle
model
.
By
contrast
,
the
difference
is
small
with
the
medium-sized
grammar
derived
from
the
development
section
.
This
result
is
not
surprising
when
one
considers
that
a
large
number
of
common
words
are
observed
to
have
many
possible
categories
.
In
the
next
section
,
we
show
that
a
supertagger
for
CCG
realization
,
or
hypertagger
,
can
reduce
the
problem
of
search
errors
by
focusing
the
search
space
on
the
most
likely
lexical
categories
.
3.2
Maximum
Entropy
Hypertagging
As
supertagging
for
parsing
involves
studying
a
given
input
word
and
its
local
context
,
the
concep
-
tual
equivalent
for
a
lexical
predicate
in
the
LF
is
to
study
a
given
node
and
its
local
graph
structure
.
Our
implementation
makes
use
of
three
general
types
of
features
:
lexicalized
features
,
which
are
simply
the
names
of
the
parent
and
child
elementary
predication
nodes
,
graph
structural
features
,
such
as
the
total
number
of
edges
emanating
from
a
node
,
the
number
of
argument
and
non-argument
dependents
,
and
the
names
of
the
relations
of
the
dependent
nodes
to
the
parent
node
,
and
syntactico-semantic
attributes
of
nodes
,
such
as
the
tense
and
number
.
For
example
,
in
the
HLDS
graph
shown
in
Figure
1
,
the
node
representing
want
has
two
dependents
,
and
the
relational
type
of
make
with
respect
to
want
is
Arg1
.
Clark
(
2002
)
notes
in
his
parsing
experiments
that
the
POS
tags
of
the
surrounding
words
are
highly
informative
.
As
discussed
below
,
a
significant
gain
in
hypertagging
accuracy
resulted
from
including
features
sensitive
to
the
POS
tags
of
a
node
's
parent
,
the
node
itself
,
and
all
of
its
arguments
and
modifiers
.
Predicting
these
tags
requires
the
use
of
a
separate
POS
tagger
,
which
operates
in
a
manner
similar
to
the
hypertagger
itself
,
though
exploiting
a
slightly
different
set
of
features
(
e.g.
,
including
features
corresponding
to
the
four-character
prefixes
and
suffixes
of
rare
logical
predication
names
)
.
Following
the
(
word
)
supertagging
experiments
of
(
Cur-ran
et
al.
,
2006
)
we
assigned
potentially
multiple
POS
tags
to
each
elementary
predication
.
The
POS
tags
assigned
are
all
those
that
are
some
factor
/
/
of
the
highest
ranked
tag,3
giving
an
average
of
1.1
POS
tags
per
elementary
predication
.
The
values
of
the
corresponding
feature
functions
are
the
POS
tag
probabilities
according
to
the
POS
tagger
.
At
this
ambiguity
level
,
the
POS
tagger
is
correct
w
92
%
of
the
time
.
Features
for
the
hypertagger
were
extracted
from
semantic
dependency
graphs
extracted
from
sections
2
through
21
of
the
CCGbank
.
In
total
,
37,168
dependency
graphs
were
derived
from
the
corpus
,
yielding
468,628
feature
parameters
.
The
resulting
contextual
features
and
goldstandard
supertag
for
each
predication
were
then
used
to
train
a
maximum
entropy
classifier
model
.
3I.e
.
,
all
tags
t
whose
probabilities
p
(
t
)
&gt;
/
?
•
p
*
,
where
p
*
is
the
highest
ranked
tag
's
probability
.
Maximum
entropy
models
describe
a
set
of
probability
distributions
of
the
form
:
where
o
is
an
outcome
,
x
is
a
context
,
the
/
i
are
feature
functions
,
the
Ai
are
the
respective
weights
of
the
feature
functions
,
and
Z
(
x
)
is
a
normalizing
sum
over
all
competing
outcomes
.
More
concretely
,
given
an
elementary
predication
labeled
want
(
as
in
Figure
1
)
,
a
feature
function
over
this
node
could
be
:
{
1
,
if
o
is
(
s
[
dcl
]
\
np
)
/
(
s
[
adj
]
\
np
)
and
number
j
)
f_LF_dependents
(
x
)
=
2
0
,
otherwise
.
We
used
Zhang
Le
's
maximum
entropy
toolkit4
for
training
the
hypertagging
model
,
which
uses
an
implementation
of
Limited-memory
BFGS
,
an
approximate
quasi-Newton
optimization
method
from
the
numerical
optimization
literature
(
Liu
and
No-cedal
,
1989
)
.
Using
L-BFGS
allowed
us
to
include
continuous
feature
function
values
where
appropriate
(
e.g.
,
the
probabilities
of
automatically-assigned
POS
tags
)
.
We
trained
each
hypertagging
model
to
275
iterations
and
our
POS
tagging
model
to
400
iterations
.
We
used
no
feature
frequency
cut-offs
,
but
rather
employed
Gaussian
priors
with
global
variances
of
100
and
75
,
respectively
,
for
the
hypertag-ging
and
POS
tagging
models
.
During
realization
,
the
hypertagger
serves
to
probabilistically
filter
the
categories
assigned
to
an
elementary
predication
,
as
well
as
to
propose
categories
for
rare
or
unseen
predicates
.
Given
a
predication
,
the
tagger
returns
a
/
-
best
list
of
supertags
in
order
of
decreasing
probability
.
Increasing
the
number
of
categories
returned
clearly
increases
the
likelihood
that
the
most-correct
supertag
is
among
them
,
but
at
a
corresponding
cost
in
chart
size
.
Accordingly
,
the
hypertagger
begins
with
a
highly
restrictive
value
for
/
,
and
backs
off
to
progressively
less-restrictive
values
if
no
complete
realization
could
be
found
using
the
set
of
supertags
returned
.
The
search
is
restarted
4http
:
/
/
homepages.inf.ed.ac.uk
/
s0450736
/
maxent_toolkit.html
.
Table
2
:
Hypertagger
accuracy
on
Sections
00
and
23
.
Results
(
in
percentages
)
are
for
per-logical-predication
(
Pr
)
and
per-whole-graph
(
Grph
)
tagging
accurcies
.
Difference
between
best-only
and
baselines
(
b.l.
)
is
significant
(
p
&lt;
2
•
10-16
)
by
McNemar
's
x2
test
.
Tags
Pred
from
scratch
with
the
next
/
value
,
though
in
principle
the
same
chart
could
be
expanded
.
The
iterative
,
/
-
best
search
for
a
complete
realization
uses
therealizer
's
packing
mode
,
whichcanmorequickly
determine
whether
a
complete
realization
is
possible
.
If
the
halfway
point
of
the
overall
time
limit
is
reached
with
no
complete
realization
,
the
search
switches
to
best-first
mode
,
ultimately
assembling
fragments
if
no
complete
realization
can
be
found
during
the
remaining
time
.
4
Results
and
Discussion
Several
experiments
were
performed
in
training
and
applying
the
hypertagger
.
Three
different
models
were
created
using
1
)
non-lexicalized
features
only
,
2
)
all
features
excluding
POS
tags
,
3
)
all
,
3
)
all
features
except
syntactico-semantic
attributes
such
as
tense
and
number
and
4
)
all
features
available
.
Models
trained
on
these
feature
subsets
were
tested
against
one
another
on
Section
00
,
and
then
the
best
performing
model
was
run
on
both
Section
00
and
4.1
Feature
Ablation
Testing
The
the
whole
feature
set
was
found
in
feature
ablation
testing
on
the
development
set
to
outperform
all
other
feature
subsets
significantly
(
p
&lt;
2.2
•
10-16
)
.
These
results
listed
in
Table
3
.
As
we
can
see
,
taking
Table
3
:
Hypertagger
feature
ablation
testing
results
on
Section
00
.
The
full
feature
set
outperforms
all
others
significantly
(
p
&lt;
2.2
•
10-16
)
.
Results
for
per-predication
(
Pr
)
and
per-whole-graph
(
Grph
)
tagging
percentage
accuracies
are
listed
.
(
Key
:
no-POS
=
no
POS
features
;
no-attr
=
no
syntactico-semantic
attributes
such
as
tense
and
number
;
non-lex
=
non-lexicalized
features
only
(
no
predication
names
)
.
FeatureSet
away
any
one
class
of
features
leads
to
drop
in
per-predication
tagging
accuracy
of
at
least
1.8
%
and
a
drop
per-whole-graph
accuracy
of
at
least
9.2
%
.
As
expected
from
previous
work
in
supertagging
(
for
parsing
)
,
POS
features
resulted
in
a
large
improvement
in
overall
accuracy
(
1.8
%
)
.
Although
the
POS
tagger
by
itself
is
only
92
%
accurate
(
as
a
multi-tagger
of
1.1
WWorl
average
ambiguity
)
—
well
below
the
state-of-the-art
for
the
tagging
of
words
—
its
predictions
are
still
quite
valuable
to
the
hyper
-
tagger
.
4.2
Best
Model
Hypertagger
Accuracy
The
results
for
the
full
feature
set
on
Sections
00
and
23
are
outlined
in
Table
2
.
Included
in
this
table
are
accuracy
data
for
a
baseline
dummy
tagger
which
simply
assigns
the
most-frequently-seen
tag
(
s
)
for
a
given
predication
and
backs
off
to
the
overall
most
frequent
tag
(
s
)
when
confronted
with
an
unseen
predication
.
The
development
set
(
00
)
was
used
to
tune
the
/
parameter
to
obtain
reasonable
hypertag
ambiguity
levels
;
the
model
was
not
otherwise
tuned
to
it
.
The
hypertagger
achieves
high
per-predication
and
whole-graph
accuracies
even
at
small
ambiguity
levels
.
4.3
Realizer
Performance
Tables
4
and
5
show
how
the
hypertagger
improves
realization
performance
on
the
development
and
test
sections
of
the
CCGbank
.
As
Table
4
indicates
,
using
the
hypertagger
in
an
iterative
beta-best
fashion
more
than
doubles
the
number
of
grammatically
complete
realizations
found
within
the
time
Table
5
:
Realization
quality
metrics
exact
match
,
BLEU
and
METEOR
,
on
complete
realizations
only
and
overall
,
with
and
without
hypertagger
,
on
Sections
00
and
23
.
Hyper-tagger
Complete
BLEU
METEOR
Table
4
:
Percentage
of
grammatically
complete
realizations
,
runtimes
for
complete
realizations
and
overall
runtimes
,
with
and
without
hypertagger
,
on
Sections
00
and
23
.
Complete
limit
;
on
the
development
set
,
this
improvement
eli-mates
more
than
the
number
of
known
search
errors
(
cf.
Table
1
)
.
Additionally
,
by
reducing
the
search
space
,
the
hypertagger
cuts
overall
realization
times
by
more
than
half
,
and
in
the
cases
where
complete
realizations
are
found
,
realization
times
are
reduced
by
a
factoroffour
,
down
to
1.2
seconds
persentence
on
a
desktop
Linux
PC
.
Table
5
shows
that
increasing
the
number
of
complete
realizations
also
yields
improved
BLEU
and
METEOR
scores
,
as
well
as
more
exact
matches
.
In
particular
,
the
hypertagger
makes
possible
a
more
than
6-point
improvement
in
the
overall
BLEU
score
on
both
the
development
and
test
sections
,
and
a
more
than
12-point
improvement
on
the
sentences
with
complete
realizations
.
As
the
effort
to
engineer
a
grammar
suitable
for
realization
from
the
CCGbank
proceeds
in
parallel
to
our
work
on
hypertagging
,
we
expect
the
hypertagger-seeded
realizer
to
continue
to
improve
,
since
a
more
complete
and
precise
extracted
grammar
should
enable
more
complete
realizations
to
be
found
,
and
richer
semantic
representations
should
simplify
the
hypertagging
task
.
Even
with
the
current
incomplete
set
of
semantic
templates
,
the
hy-pertagger
brings
realizer
performance
roughly
up
to
state-of-the-art
levels
,
as
our
overall
test
set
BLEU
score
(
0.6701
)
slightly
exceeds
that
of
Cahill
and
van
Genabith
(
2006
)
,
though
at
a
coverage
of
96
%
instead
of98
%
.
We
caution
,
however
,
thatitremains
unclear
how
meaningful
it
is
to
directly
compare
these
scores
when
the
realizer
inputs
vary
considerably
in
their
specificity
,
as
Langkilde-Geary
's
(
2002
)
experiments
dramatically
illustrate
.
5
Related
Work
Our
approach
follows
Langkilde-Geary
(
2002
)
and
Callaway
(
2003
)
in
aiming
to
leverage
the
Penn
Treebank
to
develop
a
broad-coverage
surface
re-alizer
for
English
.
However
,
while
these
earlier
,
generation-only
approaches
made
use
of
converters
for
transforming
the
outputs
of
Treebank
parsers
to
inputs
for
realization
,
our
approach
instead
employs
a
shared
bidirectional
grammar
,
so
that
the
input
to
realization
is
guaranteed
to
be
the
same
logical
form
constructed
by
the
parser
.
In
this
regard
,
our
approach
is
more
similar
to
the
ones
pursued
more
recently
by
Carroll
,
Oepen
and
Velldal
(
2005
;
2005
;
2006
)
,
Nakanishi
et
al.
(
2005
)
and
Cahill
and
van
Genabith
(
2006
)
with
HPSG
and
LFG
grammars
.
While
we
consider
our
approach
to
be
the
first
to
employ
a
supertagger
for
realization
,
or
hypertagger
,
the
approach
is
clearly
reminiscent
of
the
LTAG
tree
models
of
Srinivas
and
Rambow
(
2000
)
.
The
main
difference
between
the
approaches
is
that
ours
consists
of
a
multitagging
step
followed
by
the
bottom-up
construction
of
a
realization
chart
,
while
theirs
involves
the
top-down
selection
of
the
single
most
likely
supertag
for
each
node
that
is
grammatically
compatible
with
the
parent
node
,
with
the
probability
conditioned
only
on
the
child
nodes
.
Note
that
although
their
approach
does
involve
a
subsequent
lattice
construction
step
,
it
requires
making
non-standard
assumptions
about
the
TAG
;
in
contrast
,
ours
follows
the
chart
realization
tradition
of
working
with
the
same
operations
of
grammatical
combination
as
in
parsing
,
including
a
well-defined
notion
of
semantic
composition
.
Additionally
,
as
our
tagger
employs
maximum
entropy
modeling
,
it
is
able
to
take
into
account
a
greater
variety
of
contextual
features
,
including
those
derived
from
parent
nodes
.
In
comparison
to
other
recent
chart
realization
approaches
,
Nakanishi
et
al.
's
is
similar
to
ours
in
that
it
employs
an
iterative
beam
search
,
dynamically
changing
the
beam
size
in
order
to
cope
with
the
large
search
space
.
However
,
their
log-linear
selection
models
have
been
adapted
from
ones
used
in
parsing
,
and
do
not
condition
choices
based
on
features
of
the
input
semantics
to
the
same
extent
.
In
particular
,
while
they
employ
a
baseline
maximum
likelihood
model
that
conditions
the
probability
of
a
lexical
entry
upon
its
predicate
argument
structure
(
PAS
)
—
that
is
,
the
set
of
elementary
predications
introduced
by
the
lexical
item
—
this
probability
does
not
take
into
account
other
elements
of
the
local
context
,
including
parents
and
modifiers
,
and
their
lexical
predicates
.
Similarly
,
Cahill
and
van
Genabith
condition
the
probability
of
their
lexical
rules
on
the
set
of
feature-value
pairs
linked
to
the
RHS
of
the
rule
,
but
do
not
take
into
account
any
additional
context
.
Since
their
probabilistic
models
involve
independence
assumptions
like
those
in
a
PCFG
,
and
since
they
do
not
employ
n-grams
for
scoring
alternative
realizations
,
their
approach
only
keeps
the
single
most
likely
edge
in
an
equivalence
class
,
rather
than
packing
them
into
a
forest
.
Carroll
,
Oepen
and
Velldal
's
approach
is
like
Nakanishi
et
al.
's
in
that
they
adapt
log-linear
parsing
models
to
the
realization
task
;
however
,
they
employ
manually
written
grammars
on
much
smaller
corpora
,
and
perhaps
for
this
reason
they
have
not
faced
the
need
to
employ
an
iterative
beam
search
.
6
Conclusion
We
have
introduced
a
novel
type
of
supertagger
,
which
we
have
dubbed
a
hypertagger
,
that
assigns
CCG
category
labels
to
elementary
predications
in
a
structured
semantic
representation
with
high
accuracy
at
several
levels
of
tagging
ambiguity
in
a
fashion
reminiscent
of
(
Bangalore
and
Rambow
,
2000
)
.
To
our
knowledge
,
we
are
the
first
to
report
tagging
results
in
the
semantic-to-syntactic
direction
.
We
have
also
shown
that
,
by
integrating
this
hy-pertagger
with
a
broad-coverage
CCG
chart
real-izer
,
considerably
faster
realization
times
are
possible
(
approximately
twice
as
fast
as
compared
with
a
realizer
that
performs
simple
lexical
look-ups
)
with
higher
BLEU
,
METEOR
and
exact
string
match
scores
.
Moreover
,
the
hypertagger-augmented
real-izer
finds
more
than
twice
the
number
of
complete
realizations
,
and
further
analysis
revealed
that
the
realization
quality
(
as
per
modified
BLEU
and
METEOR
)
is
higher
in
the
cases
when
the
realizer
finds
a
complete
realization
.
This
suggests
that
further
improvements
to
the
hypertagger
will
lead
to
more
complete
realizations
,
hence
more
high-quality
realizations
.
Finally
,
further
efforts
to
engineer
a
grammar
suitable
for
realization
from
the
CCGbank
should
provide
richer
feature
sets
,
which
,
as
our
feature
ablation
study
suggests
,
are
useful
for
boosting
hypertagging
performance
,
hence
for
finding
better
and
more
complete
realizations
.
Acknowledgements
The
authors
thank
the
anonymous
reviewers
,
Chris
Brew
,
Detmar
Meurers
and
Eric
Fosler-Lussier
for
helpful
comments
and
discussion
.
