This
paper
proposes
a
new
method
for
automatic
acquisition
of
Chinese
bracketing
knowledge
from
English-Chinese
sentence-aligned
bilingual
corpora
.
Bilingual
sentence
pairs
are
first
aligned
in
syntactic
structure
by
combining
English
parse
trees
with
a
statistical
bilingual
language
model
.
Chinese
bracketing
knowledge
is
then
extracted
automatically
.
The
preliminary
experiments
show
automatically
learned
knowledge
accords
well
with
manually
annotated
brackets
.
The
proposed
method
is
particularly
useful
to
acquire
bracketing
knowledge
for
a
less
studied
language
that
lacks
tools
and
resources
found
in
a
second
language
more
studied
.
Although
this
paper
discusses
experiments
with
Chinese
and
English
,
the
method
is
also
applicable
to
other
language
pairs
.
Introduction
The
past
few
years
have
seen
a
great
success
in
automatic
acquisition
of
monolingual
parsing
knowledge
and
grammars
.
The
availability
of
large
tagged
and
syntactically
bracketed
corpora
,
such
as
Penn
Tree
bank
,
makes
it
possible
to
extract
syntactic
structure
and
grammar
rules
automatically
(
Marcus
1993
)
.
Substantial
improvements
have
been
made
to
parse
western
language
such
as
English
,
and
many
powerful
models
have
been
proposed
(
Brill
1993
,
Collins
1997
)
.
However
,
very
limited
progress
has
been
achieved
in
Chinese
.
Knowledge
acquisition
is
a
bottleneck
for
real
appication
of
Chinese
parsing
.
While
some
methods
have
been
proposed
to
learn
syntactic
knowledge
from
annotated
Chinese
corpus
,
most
of
the
methods
depended
on
the
annotated
or
partial
annotated
data
(
Zhou
1997
,
Streiter
2000
)
.
Due
to
the
limited
availbility
of
Chinese
annotated
corpus
,
tests
of
these
methods
are
still
small
in
scale
.
Although
some
institutions
and
universities
currently
are
engaged
in
building
Chinese
tree
bank
,
no
large
scale
annotated
corpus
has
been
published
until
now
because
the
complexity
in
Chinese
syntatic
sturcture
and
the
difficulty
in
corpus
annotation
(
Chen
1996
)
.
This
paper
proposes
a
novel
method
to
facilitate
the
Chinese
tree
bank
construction
.
Based
on
English-Chinese
bilingual
corpora
and
better
English
parsing
,
this
method
obtains
Chinese
bracketing
information
automatically
via
a
bilingual
model
and
word
alignment
results
.
The
main
idea
of
the
method
is
that
we
may
acquire
knowledge
for
a
language
lacking
a
rich
collection
of
resources
and
tools
from
a
second
language
that
is
full
of
them
.
The
rest
of
this
paper
is
organized
as
follows
:
In
the
next
section
,
a
bilingual
language
model
is
introduced
.
Then
,
a
bilingual
parsing
method
supervised
by
English
parsing
is
proposed
in
section
2
.
Based
on
the
bilingual
parsing
,
Chinese
bracketing
knowlege
is
extracted
in
section
3
.
The
evaluation
and
discussion
are
given
in
section
4
.
We
conclude
with
discussion
of
future
work
.
1
A
bilingual
language
model
-
ITG
Wu
(
1997
)
has
proposed
a
bilingual
language
model
called
Inversion
Transduction
Grammar
(
ITG
)
,
which
can
be
used
to
parse
bilingual
sentence
pairs
simultaneously
.
We
will
give
a
brief
description
here
.
For
details
please
refer
to
(
Wu
1995
,
Wu
1997
)
.
The
Inversion
Transduction
Grammar
is
a
bilingual
context-free
grammar
that
generates
two
matched
output
languages
(
referred
to
as
Li
and
L2
)
.
It
also
differs
from
standard
context-free
grammars
in
that
the
ITG
allows
right-hand
side
production
in
two
directions
:
straight
or
inverted
.
The
following
examples
are
two
ITG
productions
:
Each
nonterminal
symbol
stands
for
a
pair
of
matched
strings
.
For
example
,
the
nonterminal
A
stands
for
the
string-pair
(
A
;
,
A2
)
.
A
sub-string
in
L
;
,
and
A2
is
A
/
s
corresponding
translation
in
L2
.
Similarly
,
(
B
;
,
B2
)
denotes
the
string-pair
generated
by
B.
The
operator
[
]
performs
the
usual
concatenation
,
so
that
C
-
&gt;
[
A
B
]
yields
the
string-pair
(
C
;
,
C2
)
,
where
C1
=
A1B1
and
C2
=
A2B2
.
On
the
other
hand
,
the
operator
&lt;
&gt;
performs
the
straight
concatenation
for
language
1
but
the
reversing
concatenation
for
language
2
,
so
that
C
-
&gt;
&lt;
A
B
&gt;
yields
C1
=
A1B1
,
but
C2
=
B2A2
.
The
inverted
concatenation
operator
permits
the
extra
flexibility
needed
to
accommodate
many
kinds
of
word-order
variation
between
source
and
target
languages
(
Wu
1995
)
.
There
are
also
lexical
productions
of
the
following
form
in
ITG
:
This
means
that
a
symbol
x
in
language
L
;
is
translated
by
the
symbol
y
in
language
L2
.
x
or
y
may
be
a
null
symbol
e
,
which
means
there
may
be
no
counterpart
string
on
other
side
of
the
bitext
.
ITG
based
parsing
matches
constituents
for
an
input
sentence-pair
.
For
example
,
Figure
1
shows
an
ITG
parsing
tree
for
an
English-Chinese
sentence-pair
.
The
inverted
production
is
indicated
by
a
horizontal
line
in
the
parsing
tree
.
The
English
text
is
read
in
the
usual
depth-first
left
to
right
order
,
but
for
the
Chinese
text
,
a
horizontal
line
means
the
right
sub-tree
is
traversed
before
the
left
.
The
generated
parsing
results
are
:
We
can
also
represent
the
common
structure
of
the
two
sentences
more
clearly
and
compactly
with
the
aid
of
&lt;
&gt;
notation
:
where
the
horizontal
line
from
Figure
1
corresponds
to
the
&lt;
&gt;
level
of
bracketing
.
plays
/
lj
'
basketball
/
ly#
on
/
e
Sunday
V
*
*
!
^
Figure
1
Inversion
transduction
Grammar
parsing
Any
ITG
can
be
converted
to
a
normal
form
,
where
all
productions
are
either
lexical
productions
or
binary-fanout
nonterminal
productions
(
Wu
1997
)
.
If
probability
is
associated
with
each
production
,
the
ITG
is
called
the
Stochastic
Inversion
Transduction
Grammar
(
SITG
)
.
2
English
parsing
supervised
bilingual
bracketing
Because
of
the
difficulty
in
finding
a
suitable
bilingual
syntactic
grammar
for
Chinese
and
English
,
a
practical
ITG
is
the
generic
Bracketing
Inversion
Transduction
Grammar
(
BTG
)
(
Wu
1995
)
.
BTG
is
a
simplified
ITG
that
has
only
one
nonterminal
and
does
not
use
any
syntactic
grammar
.
A
Statistical
BTG
(
SBTG
)
grammar
is
as
follows
:
A
—
&gt;
ut
/
e
;
A
—
&gt;
e
/
vj
SBTG
employs
only
one
nonterminal
symbol
A
that
can
be
used
recursively
.
Here
,
"
a
"
denotes
the
probability
of
syntactic
rules
.
However
,
since
those
constituent
categories
are
not
differentiated
in
BTG
,
it
has
no
practical
effect
here
and
can
be
set
to
an
arbitrary
constant
.
The
remaining
productions
are
all
lexical
.
by
is
the
translation
probability
that
source
word
u
translates
into
target
word
vj.
by
can
be
obtained
using
a
statistical
word-translation
model
(
Melamed
2000
)
or
word
alignment
(
Lu
2001a
)
.
The
last
two
productions
denote
that
the
word
in
one
language
has
no
counterpart
on
other
side
of
the
bitext
.
A
small
constant
can
be
chosen
for
the
probabilities
bie
and
bej
.
In
BTG
,
no
language
specific
syntactic
grammar
is
used
.
The
maximum-likelihood
parser
selects
the
parse
tree
that
best
satisfies
the
combined
lexical
translation
preferences
,
as
expressed
by
the
bij
probabilities
.
Because
the
expressiveness
characteristics
of
ITG
naturally
constrain
the
space
of
possible
matching
in
a
highly
appropriate
fashion
,
BTG
achieves
encouraging
results
for
bilingual
bracketing
using
a
word-translation
lexicon
alone
(
Wu
1997
)
.
Since
no
syntactic
knowledge
is
used
in
SBTG
,
output
grammaticality
can
not
be
guaranteed
.
In
particular
,
if
the
corresponding
constituents
appear
in
the
same
order
in
both
languages
,
both
straight
and
inverted
,
then
lexical
matching
does
not
provide
the
discriminative
leverage
needed
to
identify
the
sub-constituent
boundaries
.
For
example
,
consider
an
English-Chinese
sentence
pair
:
(
4
)
English
:
That
old
teacher
is
our
adviser
.
Chinese
:
Using
SBTG
,
the
bilingual
bracketing
result
is
:
The
result
is
not
consistent
with
the
expected
syntactic
structure
.
In
this
case
,
grammatical
information
about
one
or
both
of
the
languages
can
be
very
helpful
.
For
example
,
if
we
know
the
English
parsing
result
shown
in
(
6
)
,
then
the
bilingual
bracketing
can
be
determined
easily
;
the
result
should
be
(
7
)
.
From
the
example
,
we
can
see
that
if
one
language
parser
is
available
,
the
induced
bilingual
bracketing
result
would
be
more
accurate
.
English
parsing
methods
have
been
well
studied
and
many
powerful
models
have
been
proposed
.
It
will
be
helpful
to
make
use
of
English
parsing
results
.
In
the
following
,
we
will
propose
a
method
of
bilingual
bracketing
supervised
by
English
parsing
.
Here
,
English
parsing
supervised
BTG
means
using
an
English
parser
's
bracketing
information
as
a
boundary
restriction
in
the
BTG
language
model
.
But
this
does
not
necessitate
parsing
Chinese
completely
according
to
the
same
parsing
boundary
of
English
.
If
the
English
parsing
structure
is
totally
fixed
,
it
is
possible
that
the
structure
is
not
linguistically
valid
for
Chinese
under
the
formalism
of
Inversion
Transduction
Grammar
.
To
illustrate
this
,
see
the
example
shown
in
Figure
2
.
If
you
want
to
lose
weight
,
you
had
better
eat
less
bread
.
Si
lift
#
m
Mi
iM
,
m
4
&gt;
n
£
ffi
&amp;
.
Figure
2
A
example
of
mismatch
subtree
The
sub-tree
for
blacked
underlined
part
of
English
and
corresponding
Chinese
are
shown
in
Figure
2
(
a
)
.
We
can
see
that
the
Chinese
constituents
do
not
match
the
English
counterparts
in
the
English
structure
.
In
this
case
,
our
solution
is
that
:
the
whole
English
constituent
of
"
VP
"
is
aligned
with
the
whole
Chinese
correspondence
;
i.e.
,
"
eat
less
bread
"
is
matched
with
"
"
shown
in
Figure
2
(
b
)
.
At
the
same
time
,
we
give
the
inner
structure
matching
according
to
ITG
regardless
of
the
English
parsing
constraint
.
An
"
X
"
tag
is
introduced
to
indicate
that
the
sub-bilingual-parsing-tree
is
not
consistent
with
the
given
English
sub-tree
.
Our
result
can
also
be
understood
as
a
flattened
bilingual
parsing
tree
as
shown
in
Figure
2
(
c
)
.
This
means
that
when
the
bilingual
constituents
couldn
't
match
in
the
small
syntactic
structure
,
we
will
match
them
in
a
larger
structure
.
The
main
idea
is
that
the
given
English
parser
is
only
used
as
a
boundary
constraint
for
bilingual
parsing
.
When
the
constraint
is
incompatible
with
the
bilingual
model
ITG
,
we
use
ITG
as
the
default
result
.
This
process
enables
parsing
to
go
on
regardless
of
some
failures
in
matching
.
We
heuristically
define
a
constraint
function
Fe
(
s
,
t
)
to
denote
the
English
boundary
constraint
,
where
s
is
the
beginning
position
and
t
is
the
end
.
There
are
three
cases
of
structure
matching
:
violate
match
,
exact
match
and
inside
match
.
Violate
match
means
the
bilingual
parsing
conflicts
with
the
given
English
bracketing
boundary
.
For
example
,
given
the
following
English
bracketing
result
(
8
)
,
(
1,2
)
,
(
1,3
)
,
(
2,3
)
,
(
3.5
)
are
examples
.
(
3,4
)
,
(
4,5
)
are
examples
of
inside
match
,
and
the
value
1
is
assigned
to
these
Fe
(
s
,
t
)
functions
.
max
P
[
est
/
cuv
]
denotes
the
maximum
probability
of
sub-parsing-tree
of
node
q
and
that
both
the
sub-string
es
t
and
cu
v
derive
from
node
q.
Thus
,
the
best
parser
has
the
probability
S
(
0
,
T,0
,
V
)
.
S
(
s
,
t
,
u
,
v
)
is
calculated
as
the
maximum
probability
combination
of
all
possible
sub-tree
combinations
(
Wu
1995
)
.
To
insert
English
parsing
constraints
in
bilingual
parsing
,
we
integrate
the
constraint
function
Fe
(
s
,
t
)
into
the
local
optimization
function
.
Computation
of
the
local
optimization
function
is
then
modified
as
given
below
:
S
&lt;
&gt;
(
s
,
t
,
u
,
v
)
=
max
Fe
(
s
,
t
)
S
(
s
,
S
,
U
,
v
)
S
(
S
,
t
,
u
,
U
)
.
Initialization
is
as
follows
:
where
,
T
,
V
is
the
length
of
English
and
Chinese
sentence
respectively
.
b
(
et
/
cv
)
is
the
probability
of
translating
English
word
et
into
Chinese
word
cv
.
A
minimal
probability
can
be
assigned
to
empty
word
alignment
b
(
et
/
e
)
and
b
(
e
/
cv
)
.
The
optimal
bilingual
parsing
tree
for
a
given
sentence-pair
can
be
computed
using
dynamic
programming
(
DP
)
algorithm
(
Wu
1997
)
.
Using
the
standard
SBTG
local
optimization
fuction
,
the
obtained
bilingual
parsing
result
for
the
given
sentence-pair
(
4
)
is
shown
as
example
(
5
)
;
when
using
the
above
modified
local
optimization
function
,
the
parsing
result
is
that
shown
as
example
(
7
)
.
Comparing
the
two
results
,
we
can
see
that
by
intergrating
English
parsing
constraints
into
BTG
,
the
bilingual
parsing
becomes
more
grammatical
.
Our
experiments
showed
that
this
English
parsing
supervised
BTG
would
improve
the
accuracy
of
bilingual
bracketing
by
nearly
20
%
(
Lu
2001b
)
.
The
obtained
bilingual
parsing
tree
is
in
the
normal
form
of
ITG
,
that
is
each
node
in
the
tree
is
either
a
lexical
node
or
a
binary-fanout
nonterminal
node
.
We
can
combine
the
subtree
to
restore
the
fanout
flexibility
using
the
production
characters
[
[
A414
]
=
L4
[
A4
]
]
=
[
4A4
]
and
&lt;&lt;
AA
&gt;
A
&gt;
=
&lt;
A
&lt;
AA
»
=
&lt;
AAA
&gt;
.
The
combining
operation
could
not
cross
the
given
English
parisng
boundary
.
3
Chinese
bracketing
knowledge
extraction
Table
1
shows
some
bilingual
bracketing
examples
obtained
using
the
above
method
.
To
understand
easily
,
we
give
the
tree
form
of
the
first
example
in
Figure
3
(
a
)
.
The
leaf
node
is
the
aligned
words
of
the
two
languages
and
their
POS
tag
categories
.
These
POS
tags
are
generated
from
an
English
and
a
Chinese
POS
tagger
respectively
.
The
English
POS
tag
and
phrase
tag
set
are
the
same
as
those
of
the
Penn
Tree
Bank
(
Marcus
1993
)
and
the
Chinse
POS
tag
set
please
refer
to
the
web
site
:
http
:
/
/
mtlab.hit.edu.cn
.
The
nonterminal
node
are
labeled
using
English
sub-tree
tags
.
Based
on
the
bilingual
parsing
result
,
it
is
easy
to
extract
the
Chinese
bracketing
structure
according
to
the
Inversion
Transduction
Grammar
.
For
the
normal
node
,
the
Chinese
text
is
traversed
in
depth-first
left
to
right
order
,
but
for
an
inverted
node
(
indicated
by
a
horizontal
line
in
the
parsing
tree
or
indicated
by
a
&lt;
&gt;
notation
in
bracketing
expression
)
,
the
right
sub-tree
is
traversed
before
the
left
.
Thus
,
the
Chinese
parsing
tree
corresponding
to
Figure
3
(
a
)
is
shown
in
Figure
3
(
b
)
.
The
nonterminal
labels
are
derived
from
the
English
sub-tree
.
The
extracted
Chinese
bracketing
results
from
Table1
_Table
1
Bilingual
bracketing
examples_
_Table
2
The
extracted
Chinese
bracketing
results
corresponding
to
Table
1_
are
listed
in
Table
2
.
(
a
)
Bilingual
parsing
result
supervised
by
English
parsing
(
b
)
The
Chinese
parsing
result
extracted
from
(
a
)
Figure
3
Extract
Chinese
Bracketing
structure
from
Bilingual
Parsing
It
can
be
seen
from
Table
2
that
the
automatic
acquired
bracketing
results
reflect
the
Chinese
structure
well
though
some
English
phrase
tags
are
not
suitable
to
label
the
corresponding
Chinese
phrase
directly
.
For
example
,
in
Table
2
,
the
English
tags
"
PP
(
preposition
phrase
)
"
in
sentence
1
and
"
SBAR
(
clause
)
"
in
sentence
4
are
incorrectly
tag
the
corresponding
Chinese
structure
.
We
don
't
care
about
the
phrase
tags
here
.
Our
main
concern
is
the
bracketing
boundary
of
the
syntactic
structure
.
The
bracketing
boundary
knowledge
has
been
proved
to
be
valuable
for
Chinese
grammar
induction
(
Zhou
1997
)
.
The
advantage
of
our
method
is
that
the
bracketing
knowledge
is
acquired
from
bilingual
corpus
automatically
.
It
reduces
the
manual
labour
for
corpus
tagging
,
which
are
time-consuming
and
error-prone
.
4
Evaluation
and
discussion
To
evaluate
the
quality
of
the
acquired
Chinese
bracketing
boundaries
,
we
compared
them
with
the
parsing
annotation
based
on
an
existed
Chinese
syntax
annotation
scheme
.
Detail
of
the
Chinese
syntax
annotation
scheme
and
a
annotated
corpus
can
be
download
from
the
website
http
:
/
/
mtlab.hit
.
edu.cn
.
The
test
set
consisted
of
3,000
English-Chinese
bilingual
sentence-pairs
that
come
from
the
machine
translation
evaluation
corpus
(
Duan
1996
)
.
The
average
length
is
9.1
words
for
English
sentences
and
12.6
Chinese
characters
for
Chinese
sentences
.
The
test
sentence
pairs
were
first
aligned
at
the
word
level
based
on
statistics
and
lexicon
with
a
accuracy
of
nearly
90
%
(
Lu
2001a
)
.
The
English
and
Chinese
sentences
were
parsed
based
on
the
Penn
Tree
bank
tag
set
and
the
Chinese
syntax
annotation
scheme
respectively
.
Both
the
English
and
the
Chinese
parsing
results
were
manually
corrected
.
The
corrected
Chinese
parsing
results
are
used
as
the
standard
test
set
.
We
acquired
Chinese
bracketing
results
using
the
proposed
method
.
The
previous
defined
exact
match
,
violate
match
,
and
inside
match
are
used
to
evaluate
the
accordance
between
acquired
bracketing
result
and
the
standard
parsing
result
.
Here
,
exact
match
means
the
acquired
structure
are
the
same
as
the
standard
structure
;
violate
match
means
the
acquired
structure
conflict
with
the
standard
structure
.
Otherwise
,
the
acquired
structure
is
called
a
inside
match
.
In
example
(
9
)
,
A
is
the
standard
bracketing
result
,
B
is
the
acquired
bracketing
result
and
C
demonstrates
the
classification
of
the
acquired
structures
.
The
structure
of
whole
sentence
are
not
participate
in
evaluation
.
Exact
match
rate
(
EMR
)
,
violate
match
rate
(
VMR
)
,
and
inside
match
rate
(
IMR
)
denote
the
ratio
of
three
types
of
bracketing
numbers
in
all
bracketing
numbers
respectively
.
Table
3
gives
the
evaluation
result
.
The
evaluation
results
for
acquired
Chinese
structure
corresponding
to
six
main
English
phrases
(
BNP
,
Table
3
Evaluation
on
acquired
Chinese
bracketing
results
Np
,
Vp
,
ADjp
,
ADVp
and
pp
)
are
also
given
in
detail
.
From
the
results
we
can
see
that
only
a
fraction
of
the
learned
structures
are
violate
match
(
14.03
%
)
,
most
of
them
are
exact
match
(
55.46
%
)
.
In
addition
,
there
are
also
many
inside
match
.
These
inside
matches
occured
due
to
the
difference
standard
in
phrase
merging
between
penn
Tree
bank
and
the
standard
Chinese
annotation
scheme
.
The
English
phrase
structure
are
labeled
with
more
details
.
While
for
Chinese
,
the
main
phrase
in
the
level
of
sentence
are
not
merged
futher
.
For
example
,
the
verb
and
object
in
sentence
level
are
not
combined
.
That
is
why
most
of
the
verb
phrases
(
Vp
)
are
inside
match
(
53.28
%
)
.
The
bracketing
boundary
of
inside
match
can
be
either
right
or
wrong
.
We
checked
the
correctness
of
inside
match
manually
and
got
a
average
accuray
of
79.37
%
.
Then
the
accuracy
of
all
acquired
structure
bracketing
is
79.68
%
(
EMR+IMR
x
Accuracy
of
IM
)
.
The
violate
matches
acquired
in
bilingual
parsing
are
mainly
due
to
the
empty
word
alignments
.
Such
as
in
the
special
strucures
"
IE
.
.
.
"
and
in
Chinese
.
The
word
"
IC
"
and
"
-
ft
"
has
no
counterpart
word
in
English.They
are
usually
merged
with
the
neighboring
noun
word
as
shown
in
example
(
10
)
thus
lead
to
a
violate
match
.
It
is
neccessary
to
build
special
patterns
to
handle
these
structures
.
Word
alignment
errors
also
produce
violate
matches
in
bilingual
bracketing
.
Bracket
number
Accuracy
of
IM
Accuracy
Chinese
annotated
training
corpus
,
which
is
difficult
to
accumulate
.
Another
advantage
of
our
method
is
that
the
Chinese
bracketing
result
is
derived
based
on
English
parsing
and
parallel
corpus
,
which
make
it
particularly
benefit
for
research
on
the
corresponding
relationship
between
Chinese
and
English
phrase
.
In
(
Lu
2001b
)
,
we
used
bilingual
bracketing
result
for
automatic
translation
templates
acquisition
,
which
turns
out
to
be
very
useful
for
structure
transfer
in
machine
translation
.
In
addition
,
the
acquired
bracketing
corpus
can
be
applied
to
many
Chinese
NLp
tasks
.
It
can
be
used
as
the
foundation
for
further
Chinese
treebank
annotation
,
which
will
save
human
labour
in
a
great
deal
.
It
can
also
be
used
to
improve
the
efficiency
and
accuracy
in
Chinese
grammar
induction
(
Zhou
1997
)
.
Grammar
rules
can
also
be
extracted
from
the
bracketing
corpus
.
For
example
,
we
can
obtain
the
following
BNp
rules
from
the
acquired
bracketing
results
in
Table
2
:
Conclusion
In
this
paper
,
we
have
presented
a
method
to
learn
Chinese
syntactic
structure
from
English
parsing
based
on
a
bilingual
language
model
.
The
method
creates
structure
bracketing
Chinese
corpora
automatically
by
taking
full
advantage
of
English
parsing
and
bilingual
corpora
.
The
created
corpora
are
very
useful
for
further
Chinese
corpus
annotation
and
parsing
knowledge
acquisition
.
primary
experiment
proved
the
feasibility
and
validity
of
the
method
.
Although
this
paper
is
related
to
Chinese
and
English
,
the
method
is
also
applicable
to
other
language
pairs
.
Obviously
,
if
the
concerned
languages
come
from
same
language
family
,
such
as
English
and
French
,
the
method
would
be
more
effective
.
Acknowledgements
This
research
was
funded
by
High
Technology
Research
and
Development
program
of
China
(
2001AA114101
)
.
We
also
would
like
to
thank
the
Institute
of
Computational
Linguistics
at
peking
university
for
providing
bilingual
corpora
for
test
.
