We
present
a
two-stage
multilingual
dependency
parsing
system
submitted
to
the
Multilingual
Track
of
CoNLL-2007
.
The
parser
first
identifies
dependencies
using
a
deterministic
parsing
method
and
then
labels
those
dependencies
as
a
sequence
labeling
problem
.
We
describe
the
features
used
in
each
stage
.
For
four
languages
with
different
values
of
ROOT
,
we
design
some
special
features
for
the
ROOT
labeler
.
Then
we
present
evaluation
results
and
error
analyses
focusing
on
Chinese
.
1
Introduction
The
CoNLL-2007
shared
tasks
include
two
tracks
:
the
Multilingual
Track
and
Domain
Adaptation
Track
(
Nivre
et
al.
,
2007
)
.
We
took
part
the
Multilingual
Track
of
all
ten
languages
provided
by
the
CoNLL-2007
shared
task
organizers
(
Hajic
et
al.
,
2004
;
Aduriz
et
al.
,
2003
;
Marti
et
al.
,
2007
;
Chen
et
al.
,
2003
;
Bohmova
et
al.
,
2003
;
Marcus
et
al.
,
1993
;
Johansson
and
Nugues
,
2007
;
Prokopidis
et
al.
,
2005
;
Csendes
et
al.
,
2005
;
Montemagni
et
al.
,
2003
;
Oflazer
et
al.
,
2003
)
.
In
this
paper
,
we
describe
a
two-stage
parsing
system
consisting
of
an
unlabeled
parser
and
a
sequence
labeler
,
which
was
submitted
to
the
Multilingual
Track
.
At
the
first
stage
,
we
use
the
parsing
model
proposed
by
(
Nivre
,
2003
)
to
assign
the
arcs
between
the
words
.
Then
we
obtain
a
dependency
parsing
tree
based
on
the
arcs
.
At
the
second
stage
,
we
use
a
SVM-based
approach
(
Kudo
and
Matsumoto
,
2001
)
to
tag
the
dependency
label
for
each
arc
.
The
labeling
is
treated
as
a
sequence
labeling
problem
.
We
design
some
special
features
for
tagging
the
labels
of
ROOT
for
Arabic
,
Basque
,
Czech
,
and
Greek
,
which
have
different
labels
for
ROOT
.
The
experimental
results
show
that
our
approach
can
provide
higher
scores
than
average
.
2
Two-Stage
Parsing
The
unlabeled
parser
predicts
unlabeled
directed
dependencies
.
This
parser
is
primarily
based
on
the
parsing
models
described
by
(
Nivre
,
2003
)
.
The
algorithm
makes
a
dependency
parsing
tree
in
one
left-to-right
pass
over
the
input
,
and
uses
a
stack
to
store
the
processed
tokens
.
The
behaviors
of
the
parser
are
defined
by
four
elementary
actions
(
where
TOP
is
the
token
on
top
of
the
stack
and
NEXT
is
the
next
token
in
the
original
input
string
)
:
•
Left-Arc
(
LA
)
:
Add
an
arc
from
NEXT
to
TOP
;
pop
the
stack
.
NEXT
;
push
NEXT
onto
the
stack
.
•
Reduce
(
RE
)
:
Pop
the
stack
.
•
Shift
(
SH
)
:
Push
NEXT
onto
the
stack
.
Although
(
Nivre
et
al.
,
2006
)
used
the
pseudo-projective
approach
to
process
non-projective
dependencies
,
here
we
only
derive
projective
dependency
tree
.
We
use
MaltParser
(
Nivre
et
al.
,
2006
)
V0.41
to
implement
the
unlabeled
parser
,
and
use
the
SVM
model
as
the
classifier
.
More
specifically
,
the
MaltParser
use
LIBSVM
(
Chang
and
Lin
,
2001
)
with
a
quadratic
kernel
and
the
built-in
one-versus-all
strategy
for
multi-class
classification
.
The
MaltParser
is
a
history-based
parsing
model
,
which
relies
on
features
of
the
derivation
history
to
predict
the
next
parser
action
.
We
represent
the
features
extracted
from
the
fields
of
the
data
representation
,
including
FORM
,
LEMMA
,
CPOSTAG
,
POSTAG
,
and
FEATS
.
We
use
the
features
for
all
languages
that
are
listed
as
follows
:
•
The
FORM
features
:
the
FORM
of
TOP
and
NEXT
,
the
FORM
of
the
token
immediately
before
NEXT
in
original
input
string
,
and
the
FORM
of
the
head
of
TOP
.
•
The
LEMMA
features
:
the
LEMMA
of
TOP
and
NEXT
,
the
LEMMA
of
the
token
immediately
before
NEXT
in
original
input
string
,
and
the
LEMMA
of
the
head
of
TOP
.
•
The
CPOS
features
:
the
CPOSTAG
of
TOP
and
NEXT
,
and
the
CPOSTAG
of
next
left
token
of
the
head
of
TOP
.
•
The
POS
features
:
the
POSTAG
of
TOP
and
NEXT
,
the
POSTAG
of
next
three
tokens
after
NEXT
,
the
POSTAG
of
the
token
immediately
before
NEXT
in
original
input
string
,
the
POSTAG
of
the
token
immediately
below
TOP
,
and
the
POSTAG
of
the
token
immediately
after
rightmost
dependent
of
TOP
.
•
The
FEATS
features
:
the
FEATS
of
TOP
and
NEXT
.
But
note
that
the
fields
LEMMA
and
FEATS
are
not
available
for
all
languages
.
We
denote
by
x
=
x1
,
.
.
.
,
xn
a
sentence
with
n
words
and
by
y
a
corresponding
dependency
tree
.
A
dependency
tree
is
represented
from
ROOT
to
leaves
1
The
tool
is
available
at
http
:
/
/
w3.msi.vxu.se
/
~
nivre
/
research
/
MaltParser.html
with
a
set
of
ordered
pairs
)
e
y
in
which
Xj
is
a
dependent
and
Xi
is
the
head
.
We
have
produced
the
dependency
tree
y
at
the
first
stage
.
In
this
stage
,
we
assign
a
label
l^j
)
to
each
pair
.
And
we
consider
a
first-order
Markov
chain
of
labels
.
We
used
the
package
YamCha
(
V0.33
)
2
to
implement
the
SVM
model
for
labeling
.
YamCha
is
a
powerful
tool
for
sequence
labeling
(
Kudo
and
Mat-sumoto
,
2001
)
.
After
the
first
stage
,
we
know
the
unlabeled
dependency
parsing
tree
for
the
input
sentence
.
This
information
forms
the
basis
for
part
of
the
features
of
the
second
stage
.
For
the
sequence
labeler
,
we
define
the
individual
features
,
the
pair
features
,
the
verb
features
,
the
neighbor
features
,
and
the
position
features
.
All
the
features
are
listed
as
follows
:
•
The
individual
features
:
the
FORM
,
the
LEMMA
,
the
CPOSTAG
,
the
POSTAG
,
and
the
FEATS
of
the
parent
and
child
node
.
•
The
pair
features
:
the
direction
of
dependency
,
the
combination
of
lemmata
of
the
parent
and
child
node
,
the
combination
of
parent
's
LEMMA
and
child
's
CPOSTAG
,
the
combination
of
parent
's
CPOSTAG
and
child
's
LEMMA
,
and
the
combination
of
FEATS
of
parent
and
child
.
•
The
verb
features
:
whether
the
parent
or
child
is
the
first
or
last
verb
in
the
sentence
.
2YamCha
is
available
at
http
:
/
/
chasen.org
/
~
taku
/
software
/
yamcha
/
•
The
position
features
:
whether
the
child
is
the
first
or
last
word
in
the
sentence
and
whether
the
child
is
the
first
word
of
left
or
right
of
parent
.
Because
there
are
four
languages
have
different
labels
for
root
,
we
define
the
features
for
the
root
labeler
.
The
features
are
listed
as
follows
:
•
The
individual
features
:
the
FORM
,
the
LEMMA
,
the
CPOSTAG
,
the
POSTAG
,
and
the
FEATS
of
the
parent
and
child
node
.
•
The
verb
features
:
whether
the
child
is
the
irst
or
last
verb
in
the
sentence
.
•
The
neighbor
features
:
the
combination
of
CPOSTAG
and
LEMMA
of
the
left
and
right
neighbors
of
the
parent
and
child
,
number
of
children
,
CPOSTAG
sequence
of
children
.
•
The
position
features
:
whether
the
child
is
the
irst
or
last
word
in
the
sentence
and
whether
the
child
is
the
irst
word
of
left
or
right
of
parent
.
3
Evaluation
Results
We
evaluated
our
system
in
the
Multilingual
Track
for
all
languages
.
For
the
unlabeled
parser
,
we
chose
the
parameters
for
the
MaltParser
based
on
performance
from
a
held-out
section
of
the
training
data
.
We
also
chose
the
parameters
for
Yamcha
based
on
performance
from
training
data
.
Our
official
results
are
shown
at
Table
1
.
Performance
is
measured
by
labeled
accuracy
and
unla-beled
accuracy
.
These
results
showed
that
our
two-stage
system
can
achieve
good
performance
.
For
all
languages
,
our
system
provided
better
results
than
average
performance
of
all
the
systems
(
Nivre
et
al.
,
2007
)
.
Compared
with
top
3
scores
,
our
system
provided
slightly
worse
performance
.
The
reasons
may
be
that
we
just
used
projective
parsing
algorithms
while
all
languages
except
Chinese
have
non-projective
structure
.
Another
reason
was
that
we
did
not
tune
good
parameters
for
the
system
due
to
lack
of
time
.
Data
Set
Hungarian
Table
1
:
The
results
of
proposed
approach
.
LABELED
ATTACHMENT
SCORE
(
LA
)
and
UNLA-BELED
ATTACHMENT
SCORE
(
UA
)
For
Chinese
,
the
system
achieved
81.24
%
on
labeled
accuracy
and
85.91
%
on
unlabeled
accuracy
.
We
also
ran
the
MaltParser
to
provide
the
labels
.
Besides
the
same
features
,
we
added
the
DEPREL
features
:
the
dependency
type
of
TOP
,
the
dependency
type
of
the
token
leftmost
of
TOP
,
the
dependency
type
of
the
token
rightmost
of
TOP
,
and
the
dependency
type
of
the
token
leftmost
of
NEXT
.
The
labeled
accuracy
of
MaltParser
was
80.84
%
,
0.4
%
lower
than
our
system
.
Some
conjunctions
,
prepositions
,
and
DE3
attached
to
their
head
words
with
much
lower
accuracy
:
74
%
for
DE
,
76
%
for
conjunctions
,
and
71
%
for
prepositions
.
In
the
test
data
,
these
words
formed
19.7
%
.
For
Chinese
parsing
,
coordination
and
preposition
phrase
attachment
were
hard
problems
.
(
Chen
et
al.
,
2006
)
deined
the
special
features
for
coordinations
for
chunking
.
In
the
future
,
we
plan
to
deine
some
special
features
for
these
words
.
including
"
M
/
ff
/
itk
/
i.
"
.
Table
2
:
The
words
where
most
of
errors
occur
in
Chinese
data
.
museum
)
"
was
to
be
tagged
as
"
predication
"
instead
of
"
property
"
.
It
was
very
hard
to
tell
the
labels
between
the
words
around
"
(
$
"
.
Humans
can
make
the
distinction
between
property
and
predication
for
"
(
$
"
,
because
we
have
background
knowledge
of
the
words
.
So
if
we
can
incorporate
the
additional
knowledge
for
the
system
,
the
system
may
assign
the
correct
label
.
For
'
\
/
C
"
,
it
was
hard
to
assign
the
head
,
36
wrong
head
of
all
38
errors
.
It
often
appeared
at
coordination
expressions
.
For
example
,
the
head
of
\
"
at
"
ft
/
gt
/
fg
/
7A
/
^
/
^
/
T
/
^^
/
(
Besides
extreme
cool
and
too
amazing
)
"
was
and
the
head
of
'
\
"
at
"
J
|
«
/
#tI
/
#
/
^J
?
A
/
W
/
iR
l^
/
fftl
/
^nffKGive
the
visitors
solid
and
methodical
knowledge
)
"
was
"
Mi
"
.
5
Conclusion
In
this
paper
,
we
presented
our
two-stage
dependency
parsing
system
submitted
to
the
Multilingual
Track
of
CoNLL-2007
shared
task
.
We
used
Nivre
's
method
to
produce
the
dependency
arcs
and
the
sequence
labeler
to
produce
the
dependency
labels
.
The
experimental
results
showed
that
our
system
can
provide
good
performance
for
all
languages
.
