Three
versions
of
the
Covington
algorithm
for
non-projective
dependency
parsing
have
been
tested
on
the
ten
different
languages
for
the
Multilingual
track
of
the
CoNLL-X
Shared
Task
.
The
results
were
achieved
by
using
only
information
about
heads
and
daughters
as
features
to
guide
the
parser
which
obeys
strict
incrementality
.
1
Introduction
In
this
paper
we
focus
on
two
things
.
First
,
we
investigate
the
impact
of
using
different
flavours
of
Covington
's
algorithm
(
Covington
,
2001
)
for
non-projective
dependency
parsing
on
the
ten
different
languages
provided
for
CoNLL-X
Shared
Task
(
Nivre
et
al.
,
2007
)
.
Second
,
we
test
the
performance
of
a
pure
grammar-based
feature
model
in
strictly
incremental
fashion
.
The
grammar
model
relies
only
on
the
knowledge
of
heads
and
daughters
of
two
given
words
,
as
well
as
the
words
themselves
,
in
order
to
decide
whether
they
can
be
linked
with
a
certain
dependency
relation
.
In
addition
,
none
of
the
three
parsing
algorithms
guarantees
that
the
output
dependency
graph
will
be
projective
.
2
Covington
's
algorithm
(
s
)
In
his
(
2001
)
paper
,
Covington
presents
a
"
fundamental
"
algorithm
for
dependency
parsing
,
which
he
claims
has
been
known
since
the
1960s
but
has
,
up
to
his
paper-publication
,
not
been
presented
systematically
in
the
literature
.
We
take
three
of
its
flavours
,
which
enforce
uniqueness
(
a.k.a.
single-headedness
)
but
do
not
observe
projectivity
.
The
algorithms
work
one
word
at
a
time
and
attempt
to
build
a
connected
dependency
graph
with
only
a
single
left-to-right
pass
through
the
input
.
The
three
flavours
are
:
Exhaustive
Search
,
Head
First
with
Uniqueness
(
ESHU
)
,
Exhaustive
Search
Dependents
First
with
Uniqueness
(
ESDU
)
and
List-based
search
with
Uniqueness
(
LSU
)
.
The
yes
/
no
function
HEAD
?
(
w1
,
w2
)
,
checks
whether
a
word
wl
can
be
a
head
of
a
word
w2
according
to
a
grammar
G.
It
also
respects
the
single-head
and
no-cycle
conditions
.
The
LINK
(
w1
,
w2
)
procedure
links
word
wl
as
the
head
of
word
w2
with
a
dependency
relation
as
proposed
by
G.
When
traversing
Headlist
and
Wordlist
we
start
with
the
last
word
added
.
(
Nivre
,
2007
)
describes
an
optimized
version
of
Covington
's
algorithm
implemented
in
MaltParser
(
Nivre
,
2006
)
with
a
running
time
c
(
^
|
—
f
)
for
an
n-word
sentence
,
where
c
is
some
constant
time
in
which
the
LINK
operation
can
be
performed
.
However
,
due
to
time
constraints
,
we
will
not
bring
this
version
of
the
algorithm
into
focus
,
but
see
some
preliminary
remarks
on
it
with
respect
to
our
parsing
model
in
6
.
foreach
Hin
Wordlist
terminate
this
foreach
loop
;
if
no
head
for
W
was
found
then
3
Classifier
as
an
Instant
Grammar
The
HEAD
?
function
in
the
algorithms
presented
in
2
,
requires
an
"
instant
grammar
"
(
Covington
,
2001
)
of
some
kind
,
which
can
tell
the
parser
whether
the
two
words
under
scrutiny
can
be
linked
and
with
what
dependency
relation
.
To
satisfy
this
requirement
,
we
use
TiMBL
-
a
Memory-based
learner
(
Daelemans
et
al.
,
2004
)
-
as
a
classifier
to
predict
the
relation
(
if
any
)
holding
between
the
two
words
.
Building
heavily
on
the
ideas
of
History-based
parsing
(
Black
et
al.
,
1993
;
Nivre
,
2006
)
,
training
the
parser
means
essentially
running
the
parsing
algorithms
in
a
learning
mode
on
the
data
in
order
to
gather
training
instances
for
the
memory-based
learner
.
In
a
learning
mode
,
the
HEAD
?
function
has
access
to
a
fully
parsed
dependency
graph
.
In
the
parsing
mode
,
the
HEAD
?
function
in
the
algorithms
issues
a
call
to
the
classifier
using
features
from
the
parsing
history
(
i.e.
a
partially
built
dependency
graph
PG
)
.
1
Covington
adds
W
to
the
Wordlist
as
soon
as
it
has
been
seen
,
however
we
have
chosen
to
wait
until
after
all
tests
have
been
completed
.
classifier
then
attempts
to
map
this
feature
vector
to
any
of
predefined
classes
.
These
are
all
the
dependency
relations
,
as
defined
by
the
treebank
and
the
class
"
NO
"
in
the
cases
where
no
link
between
the
two
words
is
possible
.
4
The
Grammar
model
The
features
used
in
our
history-based
model
are
restricted
only
to
the
partially
built
graph
PG
.
We
call
this
model
a
pure
grammar-based
model
since
the
only
information
the
parsing
algorithms
have
at
their
disposal
is
extracted
from
the
graph
,
such
as
the
head
and
daughters
of
the
current
word
.
Preceding
words
not
included
in
the
PG
as
well
as
words
following
the
current
word
are
not
available
to
the
algorithm
.
In
this
respect
such
a
model
is
very
restrictive
and
suffers
from
the
pitfalls
of
the
incremental
processing
(
Nivre
,
2004
)
.
The
motivation
for
the
chosen
model
,
was
to
approximate
a
Data
Oriented
Parsing
(
DOP
)
model
(
e.g.
(
Bod
et
al.
,
2003
)
)
for
Dependency
Grammar
.
Under
DOP
,
analyses
of
new
sentences
are
produced
by
combining
previously
seen
tree
fragments
.
However
,
the
tree
fragments
under
the
original
DOP
model
are
static
,
i.e.
we
have
a
corpus
of
all
possible
subtrees
derived
from
a
treebank
.
Under
our
approach
,
these
tree
fragments
are
built
dynamically
,
as
we
try
to
parse
the
sentence
.
Because
of
the
chosen
DOP
approximation
,
we
have
not
included
information
about
the
preceding
and
following
words
of
the
two
words
to
be
linked
in
our
feature
model
.
To
exemplify
our
approach
,
(
1
)
shows
a
partially
build
graph
and
all
the
words
encountered
so
far
and
Fig
.
1
shows
two
examples
of
the
tree-building
operations
for
linking
words
f
and
d
,
and
f
and
a.
(
1
)
abc
d
ef
.
.
.
Given
two
words
i
and
j
to
be
linked
with
a
dependency
relation
,
such
that
word
j
precedes
word
i
,
the
following
features
describe
the
models
on
which
the
algorithms
have
been
trained
and
tested
:
ds
(
i
)
means
any
two
daughters
(
if
available
)
of
word
i
,
h
(
i
/
j
)
refers
to
the
head
of
word
i
or
word
j
,
depending
on
the
direction
of
applying
the
HEAD
?
function
(
see
Fig
1
)
and
h
(
h
(
i
/
j
)
)
stands
for
the
head
of
the
head
of
word
i
or
word
j.
The
basic
model
,
which
was
used
for
the
largest
training
data
sets
of
Czech
and
Chinese
,
includes
only
the
first
four
features
in
every
category
.
A
larger
model
used
for
the
datasets
of
Catalan
and
Hungarian
adds
the
h
(
j
/
i
)
feature
from
every
category
.
The
enhanced
model
used
for
Arabic
,
Basque
,
English
,
Greek
,
Italian
and
Turkish
uses
the
full
set
of
features
.
This
tripartite
division
of
models
was
motivated
only
by
time
-
and
resource-constraints
.
The
simplest
model
is
for
Chinese
and
uses
only
5
features
while
the
enhanced
model
for
Arabic
for
example
uses
a
total
of39
features
.
5
Results
and
Setup
Table
1
summarizes
the
results
of
testing
the
three
algorithms
on
the
ten
different
languages
.
The
parser
was
written
in
C#.
Training
and
testing
were
performed
on
a
MacOSX
10.4.9
with
2GHz
Intel
Core2Duo
processor
and
1GB
memory
,
and
a
Dell
Dimension
with
2.80GHz
Pentium
4
processor
and
1GB
memory
running
Mepis
Linux
.
TiMBL
was
run
in
client-server
mode
with
default
settings
(
IB1
learning
algorithm
,
extrapolation
from
the
most
similar
example
,
i.e.
k
=
1
,
initiated
with
the
command
"
Timbl
-
S
&lt;
portnumber
&gt;
-
f
Table
1
:
Test
results
for
the
10
languages
.
LA
is
the
Labelled
Attachment
Score
and
UA
is
the
Unla-belled
Attachment
Score
&lt;
training_file
&gt;
"
)
.
Additionally
,
we
attempted
to
use
Support
Vector
Machines
(
SVM
)
as
an
alternative
classifier
.
However
,
due
to
the
long
training
time
,
results
from
using
SVM
were
not
included
but
training
an
SVM
classifier
for
some
of
the
languages
has
started
.
6
Discussion
Before
we
attempt
a
discussion
on
the
results
presented
in
Table
1
,
we
give
a
short
summary
ofthe
basic
word
order
typology
of
these
languages
according
to
(
Greenberg
,
1963
)
.
Table
2
shows
whether
the
languages
are
SVO
(
subject-verb-object
)
or
SOV
(
subject-object-verb
)
,
or
VSO
(
verb-subject-object
)
;
contain
Pr
(
prepositions
)
or
Po
(
postpositions
)
;
NG
(
noun
precedes
genitive
)
or
GN
(
genitive
precedes
noun
)
;
AN
(
adjective
precedes
noun
)
or
NA
(
noun
precedes
adjective
)
.
2Greenberg
had
give
varying
for
the
word-order
typology
of
English
.
However
,
we
trusted
our
own
intuition
as
well
as
the
hint
of
one
of
the
reviewers
.
English2
Hungarian
Table
2
:
Basic
word
order
typology
of
the
ten
languages
following
Greenberg
's
Universals
Looking
at
the
data
in
Table
1
,
several
observations
can
be
made
.
One
is
the
different
performance
of
languages
from
the
same
language
family
,
i.e.
Italian
,
Greek
and
Catalan
.
However
,
the
head-first
(
ESHU
)
algorithm
presented
better
than
the
dependents-first
(
ESDU
)
one
in
all
of
these
languages
.
The
SOV
languages
like
Hungarian
,
Basque
and
Turkish
had
preference
for
the
dependent
's
first
algorithms
(
ESDU
and
LSU
)
.
The
ESDU
algorithm
also
fared
better
with
the
SVO
languages
,
except
for
Italian
.
However
,
the
Greenberg
's
basic
word
order
typology
cannot
shed
enough
light
into
the
performance
of
the
three
parsing
algorithms
.
One
question
that
pops
up
immediately
is
whether
a
different
feature-model
using
the
same
parsing
algorithms
would
achieve
similar
results
.
Can
the
different
performance
be
attributed
to
the
treebank
annotation
?
Would
another
classifier
fare
better
than
the
Memory-based
one
?
These
questions
remain
for
future
research
though
.
Finally
,
for
the
Basque
data
we
attempted
to
test
the
optimized
version
of
the
Covington
algorithm
(
Nivre
,
2007
)
against
the
three
other
versions
discussed
here
.
Additionally
,
since
our
feature
vectors
differed
from
those
described
in
(
Nivre
,
2007
)
,
head-dependent-features
vs.
j-i-features
,
we
changed
them
so
that
all
the
four
algorithms
send
a
similar
feature
vector
,
j-i-features
,
to
the
classifier
.
The
preliminary
result
was
that
Nivre
's
version
was
the
fastest
,
with
fewer
calls
to
the
LINK
procedure
and
with
the
smallest
training
data-set
.
However
,
all
the
four
algorithms
showed
about
20
%
decrease
in
LA
/
UA
scores
.
Our
first
intuition
about
the
results
from
the
tests
done
on
all
the
10
languages
was
that
the
classification
task
suffered
from
a
highly
skewed
class
distribution
since
the
training
instances
that
correspond
to
a
dependency
relation
are
largely
outnumbered
by
was
low
and
we
expected
the
classifier
to
be
able
to
predict
more
of
the
required
links
.
However
,
the
results
we
got
from
additional
optimizations
we
performed
on
Hungarian
,
following
recommendation
from
the
anonymous
reviewers
,
may
lead
to
a
different
conclusion
.
The
chosen
grammar
model
,
relying
only
on
connecting
dynamically
built
partial
dependency
graphs
,
is
insufficient
to
take
us
over
a
certain
threshold
.
7
Conclusion
In
this
paper
we
showed
the
performance
of
three
flavours
of
Covington
's
algorithm
for
non-projective
dependency
parsing
on
the
ten
languages
provided
for
the
CoNLL-X
Shared
Task
(
Nivre
et
al.
,
2007
)
.
The
experiment
showed
that
given
the
grammar
model
we
have
adopted
it
does
matter
which
version
of
the
algorithm
one
uses
.
The
chosen
model
,
however
,
showed
a
poor
performance
and
suffered
from
two
major
flaws
-
the
use
of
only
partially
built
graphs
and
the
pure
incremental
processing
.
It
remains
to
be
seen
how
these
parsing
algorithms
will
perform
in
a
parser
,
with
a
much
richer
feature
model
and
whether
it
is
worth
using
different
flavours
when
parsing
different
languages
or
the
differences
among
them
are
insignificant
.
Acknowledgements
We
would
like
to
thank
the
two
anonymous
reviewers
for
their
valuable
comments
.
We
are
grateful
to
Joakim
Nivre
for
discussion
on
the
Covington
algorithm
,
Bertjan
Busser
for
help
with
TiMBL
,
Antal
van
den
Bosch
for
help
with
paramsearch
,
Matthew
Johnson
for
providing
the
necessary
functionality
to
his
.
NET
implementation
of
SVM
and
Patrycja
Jablonska
for
discussion
on
the
Greenberg
's
Universals
.
