In
the
paper
we
describe
a
dependency
parser
that
uses
exact
search
and
global
learning
(
Crammer
et
al.
,
2006
)
to
produce
labelled
dependency
trees
.
Our
system
integrates
the
task
of
learning
tree
structure
and
learning
labels
in
one
step
,
using
the
same
set
of
features
for
both
tasks
.
During
label
prediction
,
the
system
automatically
selects
for
each
feature
an
appropriate
level
of
smoothing
.
We
report
on
several
experiments
that
we
conducted
with
our
system
.
In
the
shared
task
evaluation
,
it
scored
better
than
average
.
1
Introduction
Dependency
parsing
is
a
topic
that
has
engendered
increasing
interest
in
recent
years
.
One
promising
approach
is
based
on
exact
search
and
structural
learning
(
McDonald
et
al.
,
2005
;
McDonald
and
Pereira
,
2006
)
.
In
this
work
we
also
pursue
this
approach
.
Our
system
makes
no
provisions
for
non-projective
edges
.
In
contrast
to
previous
work
,
we
aim
to
learn
labelled
dependency
trees
at
one
fell
swoop
.
This
is
done
by
maintaining
several
copies
of
feature
vectors
that
capture
the
features
'
impact
on
predicting
different
dependency
relations
(
deprels
)
.
In
order
to
preserve
the
strength
of
McDonald
et
al.
(
2005
)
'
s
approach
in
terms
of
unla-belled
attachment
score
,
we
add
feature
vectors
for
generalizations
over
deprels
.
We
also
employ
various
reversible
transformations
to
reach
treebank
formats
that
better
match
our
feature
representation
and
that
reduce
the
complexity
of
the
learning
task
.
The
paper
first
presents
the
methodology
used
,
goes
on
to
describing
experiments
and
results
and
finally
concludes
.
2
Methodology
2.1
Parsing
Algorithm
In
our
approach
,
we
adopt
Eisner
(
1996
)
'
s
bottom-up
chart-parsing
algorithm
in
McDonald
et
al.
(
2005
)
'
s
formulation
,
which
finds
the
best
pro-jective
dependency
tree
for
an
input
string
(
xi
,
.
.
.
,
xn
)
.
We
assume
that
every
possible
head-dependent
pair
is
described
by
a
feature
vector
with
associated
weights
.
Eisner
's
algorithm
achieves
optimal
tree
packing
by
storing
partial
structures
in
two
matrices
b
and
L
.
First
the
diagonals
of
the
matrices
are
initiated
with
0
;
then
all
other
cells
are
filled
according
to
eqs
.
(
1
)
and
(
2
)
and
their
symmetric
variants
.
This
algorithm
only
accommodates
features
for
single
links
in
the
dependency
graph
.
We
also
investigated
an
extension
,
McDonald
and
Pereira
(
2006
)
'
s
second-order
model
,
where
more
of
the
parsing
history
is
taken
into
account
,
viz
.
the
last
dependent
assigned
to
a
head
i.
In
the
extended
model
,
b
is
updated
as
defined
in
eq
.
(
3
)
;
optimal
packing
requires
a
third
matrix
M.
2.2
Feature
Representation
features
for
root
words
w
fp
lcp
.
lcpjjpjfpk
,
wifpjfpk
}
)
.
All
features
but
unary
token
features
were
optionally
extended
with
direction
of
dependency
(
or
)
and
binned
token
distance
(
|
»
-
j
\
=
1
,
2
,
3
,
4
,
&gt;
5
,
&gt;
10
)
.
2.3
Structural
Learning
For
determining
feature
weights
iu
,
we
used
online
passive-aggressive
learning
(
OPAL
)
(
Crammer
et
al.
,
2006
)
.
OPAL
iterates
repeatedly
over
all
training
instances
,
adapting
weights
after
each
parse
.
It
tries
to
change
weights
as
little
as
possible
(
passive-ness
)
,
while
ensuring
that
(
1
)
the
correct
tree
y
gets
at
least
as
much
weight
as
the
best
parse
tree
and
(
2
)
the
difference
in
weight
between
and
rises
with
the
average
number
of
errors
in
y
(
aggressiveness
)
.
This
optimization
problem
has
a
closed-form
solution
:
'
Agreement
was
computed
from
morphological
features
,
viz
.
gender
,
number
and
person
,
and
case
.
In
languages
with
subject-verb
agreement
,
we
added
a
nominative
case
feature
to
finite
verbs
.
In
Basque
,
agreement
is
case-specific
(
absolutive
,
dative
,
ergative
,
other
case
)
.
features
iteration
Table
1
:
Performance
on
devset
of
Italian
treebank
.
In
parentheses
:
reduction
to
non-null
features
after
first
iteration
.
Having
a
closed-form
solution
,
OPAL
is
easier
to
implement
and
more
efficient
than
the
MIRA
algorithm
used
by
McDonald
et
al.
(
2005
)
,
although
it
achieves
a
performance
comparable
to
MIRA
's
on
many
problems
(
Crammer
et
al.
,
2006
)
.
2.4
Learning
Labels
for
Dependency
Relations
So
far
,
the
presented
system
,
which
follows
closely
the
approach
of
McDonald
et
al.
(
2005
)
,
only
predicts
unlabelled
dependency
trees
.
To
derive
a
labeling
,
we
departed
from
their
approach
:
We
split
each
feature
along
the
deprel
label
dimension
,
so
that
each
deprel
is
associated
with
its
own
feature
vector
(
cf.
eq
.
(
4
)
,
where
®
is
the
tensor
product
and
the
orthogonal
encoding
)
.
In
parsing
,
we
only
consider
the
best
deprel
label
.
On
its
own
,
this
simple
approach
led
to
a
severe
degradation
of
performance
,
so
we
took
a
step
back
by
re-introducing
features
for
unlabelled
trees
.
For
each
set
of
deprels
,
we
designed
a
taxonomy
with
a
single
maximal
element
(
complete
abstraction
over
deprel
labels
)
and
one
minimal
element
for
each
deprel
label
.
We
also
included
an
intermediate
layer
in
T
that
collects
classes
of
deprels
,
such
as
Features
Table
2
:
Figures
for
Experiments
on
Treebanks
.
complement
,
adjunct
,
marker
,
punctuation
,
or
coordination
deprels
,
and
in
this
way
provides
for
better
smoothing
.
The
taxonomy
translates
to
an
encoding
,
where
iff
node
in
is
an
ancestor
of
(
Tsochantaridis
et
al.
,
2004
)
.
Substituting
for
leads
to
a
massive
amount
of
features
,
so
we
pruned
the
taxonomy
on
a
feature-to-feature
basis
by
merging
all
nodes
on
a
level
that
only
encompass
deprels
that
never
occur
with
this
feature
in
the
training
data
.
2.5
Treebank
Transformations
Having
no
explicit
feature
representation
for
the
information
in
the
morphological
features
slot
(
cf.
section
2.2
)
,
we
partially
redistributed
that
information
to
other
slots
:
Verb
form
,
case2
to
fp
,
semantic
classification
to
an
empty
lemma
slot
(
Turkish
affixes
,
e.g.
"
Able
"
,
"
Ly
"
)
.
The
balance
between
fp
and
w
was
not
always
optimal
;
we
used
a
fine-grained3
classification
in
punctuation
tags
,
distinguished
between
prepositions
(
e.g.
in
)
and
preposition-article
combinations
(
e.g.
nel
)
in
Italian4
on
the
basis
of
number
/
gender
features
,
and
collected
definite
and
indefinite
articles
under
one
common
fp
tag
.
When
distinctions
in
deprels
are
recoverable
from
context
,
we
removed
them
:
The
dichotomy
between
conjunctive
and
disjunctive
coordination
in
Italian
2Case
was
transferred
to
fp
only
if
important
for
determination
of
deprel
(
CA
,
HU
,
IT
)
.
3Classes
of
punctuation
are
e.g.
opening
and
closing
brackets
,
commas
and
punctuation
signalling
the
end
ofa
sentence
.
4Prep
and
PrepArt
behave
differently
syntactically
(
e.g.
an
article
can
only
follow
a
genuine
preposition
)
.
depends
in
most
cases
exclusively
on
the
coordinating
conjunction
.
The
Greek
and
Czech
treebanks
have
a
generic
distinction
between
ordinary
deprels
and
deprels
in
a
coordination
,
apposition
,
and
parenthesis
construction
.
In
Greek
,
we
got
rid
of
the
parenthesis
markers
on
deprels
by
switching
head
and
dependent
,
giving
the
former
head
(
the
parenthesis
)
a
unique
new
deprel
.
For
Czech
,
we
reduced
the
number
of
deprels
from
46
to
34
by
swapping
the
deprels
of
conjuncts
,
appositions
,
etc.
and
their
heads
(
coordination
or
comma
)
.
Sometimes
,
multiple
conjuncts
take
different
deprels
.
We
only
provided
for
the
clash
between
"
ExD
"
(
ellipsis
)
and
other
deprels
,
in
which
case
we
added
"
ExD
"
,
see
below
.
rozliseni
standard
-
Apos
:
ExD
In
Basque
,
agreement
is
usually
between
arguments
and
auxiliary
verbs
,
so
we
re-attached5
relevant
arguments
from
main
verb
to
auxiliary
verb
.
The
training
set
for
Arabic
contains
some
very
long
sentences
(
up
to
396
tokens
)
.
Since
context-free
parsing
sentences
of
this
length
is
tedious
,
we
split
up
all
sentences
at
final
punctuation
signs
Unfortunately
,
we
did
not
take
into
account
projectivity
,
so
this
step
resulted
in
a
steep
increase
of
non-projective
edges
(
9.4
%
of
all
edges
)
and
a
corresponding
degradation
of
our
evaluation
results
in
Basque
.
Language
Hungarian
Table
3
:
Results
on
DevTest
and
Test
Sets
compared
with
the
Average
Performance
in
CoNLL'07
.
LAS
=
Labelled
Attachment
Score
,
UAS
=
Unlabelled
Attachment
Score
,
LAcc
=
Label
Accuracy
,
AV
=
Average
score
.
(
AuxK
)
.
With
this
trick
,
we
pushed
down
maximal
sentence
length
to
196
.
Unfortunately
,
we
overlooked
the
fact
that
in
Turkish
,
the
ROOT
deprel
not
only
designates
root
nodes
but
also
attaches
some
punctuation
marks
.
This
often
leads
to
non-projective
structures
,
which
our
parser
cannot
handle
,
so
our
parser
scored
below
average
in
Turkish
.
In
after-deadline
experiments
,
we
took
this
feature
of
the
Turkish
treebank
into
account
and
achieved
above-average
results
by
re-linking
all
ROOT-ed
punctuation
signs
to
the
immediately
preceding
token
.
3
Experiments
and
Results
deprel
labels
)
.
The
last
column
in
Table
2
shows
the
average
time
needed
in
a
training
iteration
.
For
nearly
all
languages
,
our
approach
achieved
a
performance
better
than
average
(
see
Table
3
)
.
Only
in
Turkish
and
Basque
did
we
score
below
average
.
On
closer
inspection
,
we
saw
that
this
performance
was
due
to
our
projectivity
assumption
and
to
insufficient
exploration
of
these
treebanks
.
In
its
bottom
part
,
Table
3
gives
results
of
improved
versions
of
our
approach
.
4
Conclusion
We
presented
an
approach
to
dependency
parsing
that
is
based
on
exact
search
and
global
learning
.
Special
emphasis
is
laid
on
an
integrated
derivation
of
labelled
and
unlabelled
dependency
trees
.
We
also
employed
various
transformation
techniques
to
reach
treebank
formats
that
are
better
suited
to
our
approach
.
The
approach
scores
better
than
average
in
(
nearly
)
all
languages
.
Nevertheless
,
it
is
still
a
long
way
from
cutting-edge
performance
.
One
direction
we
would
like
to
explore
in
the
future
is
the
integration
of
dynamic
features
on
deprel
labels
.
Acknowledgements
We
would
like
to
thank
the
organizing
team
for
making
possible
again
a
great
shared
task
at
CoNLL
!
