Previous
machine
learning
techniques
for
answer
selection
in
question
answering
(
QA
)
have
required
question-answer
training
pairs
.
It
has
been
too
expensive
and
labor-intensive
,
however
,
to
collect
these
training
pairs
.
This
paper
presents
a
novel
unsupervised
support
vector
machine
(
U-SVM
)
classifier
for
answer
selection
,
which
is
independent
of
language
and
does
not
require
hand-tagged
training
pairs
.
The
key
ideas
are
the
following
:
1
.
unsupervised
learning
of
training
data
for
the
classifier
by
clustering
web
search
results
;
and
2
.
selecting
the
correct
answer
from
the
candidates
by
classifying
the
question
.
The
comparative
experiments
demonstrate
that
the
proposed
approach
significantly
outperforms
the
retrieval-based
model
(
Retrieval-M
)
,
the
supervised
SVM
classifier
(
S-SVM
)
,
and
the
pattern-based
model
(
Pattern-M
)
for
answer
selection
.
Moreover
,
the
cross-model
comparison
showed
that
the
performance
ranking
of
these
models
was
:
U-SVM
&gt;
Pattern-M
&gt;
S-SVM
&gt;
Retrieval-M
.
1
Introduction
The
purpose
of
answer
selection
in
QA
is
to
select
the
exact
answer
to
the
question
from
the
extracted
candidate
answers
.
In
recent
years
,
many
supervised
machine
learning
techniques
for
answer
selection
in
open-domain
question
answering
have
been
investigated
in
some
pioneering
studies
[
Itty-cheriah
et
al.
2001
;
Ng
et
al.
2001
;
Suzuki
et
al.
2002
;
Sasaki
,
et
al.
2005
;
and
Echihabi
et
al.
2003
]
.
Compared
with
retrieval-based
[
Yang
et
al.
2003
]
,
pattern-based
[
Ravichandran
et
al.
2002
and
Soub-botin
et
al.
2002
]
,
and
deep
NLP-based
[
Moldovan
et
al.
2002
,
Hovy
et
al.
2001
;
and
Pasca
et
al.
2001
]
answer
selection
,
machine
learning
techniques
are
more
effective
in
constructing
QA
components
from
scratch
.
These
techniques
suffer
,
however
,
from
the
problem
of
requiring
an
adequate
number
of
hand-tagged
question-answer
training
pairs
.
It
is
too
expensive
and
labor
intensive
to
collect
such
training
pairs
for
supervised
machine
learning
techniques
.
To
tackle
this
knowledge
acquisition
bottleneck
,
this
paper
presents
an
unsupervised
SVM
classifier
for
answer
selection
,
which
is
independent
of
language
and
question
type
,
and
avoids
the
need
for
hand-tagged
question-answer
pairs
.
The
key
ideas
are
as
follows
:
Regarding
answer
selection
as
a
kind
of
classification
task
and
adopting
an
SVM
classifier
;
Applying
unsupervised
learning
of
pseudotraining
data
for
the
SVM
classifier
by
clustering
web
search
results
;
Training
the
SVM
classifier
by
using
three
types
of
features
extracted
from
the
pseudotraining
data
;
and
Selecting
the
correct
answer
from
the
candidate
answers
by
classifying
the
question
.
Note
that
this
means
classifying
a
question
into
one
of
the
clusters
learned
by
clustering
web
search
results
.
Therefore
,
our
classifying
the
question
Proceedings
of
the
2007
Joint
Conference
on
Empirical
Methods
in
Natural
Language
Processing
and
Computational
Natural
Language
Learning
,
pp.
33-41
,
Prague
,
June
2007
.
©
2007
Association
for
Computational
Linguistics
Figure
1
:
Web
Question
Answering
Architecture
is
different
from
conventional
question
classification
(
QC
)
[
Li
et
al.
2002
]
that
determines
the
answer
type
of
the
question
.
The
proposed
approach
is
fully
unsupervised
and
starts
only
from
a
user
question
.
It
does
not
require
richly
annotated
corpora
or
any
deep
linguistic
tools
.
To
the
best
of
our
knowledge
,
no
research
on
this
kind
of
study
we
discuss
here
has
been
reported
.
Figure
1
illustrates
the
architecture
of
our
web
QA
approach
.
The
S-SVM
and
Pattern-M
models
are
included
for
comparison
.
Because
the
focus
of
this
paper
just
evaluates
the
answer
selection
part
,
our
approach
requires
knowledge
of
the
answer
type
to
the
question
in
order
to
find
candidate
answers
,
and
that
the
answer
must
be
a
NE
for
convenience
in
candidate
extraction
.
Experiments
using
Chinese
versions
of
the
TREC
2004
and
2005
test
data
sets
show
that
our
approach
significantly
outperforms
the
S-SVM
for
answer
selection
,
with
a
topA
score
improvement
of
more
than
20
%
.
Results
obtained
with
the
test
data
set
in
[
Wu
cross-model
comparison
demonstrates
that
the
performance
ranking
of
all
models
considered
is
:
U-SVM
&gt;
Pattern-M
&gt;
S-SVM
&gt;
Retrieval-M
.
2
Comparison
among
Models
Related
researches
on
answer
selection
in
QA
can
be
classified
into
four
categories
.
The
retrieval-based
model
[
Yang
et
al.
2003
]
selects
a
correct
answer
from
the
candidates
according
to
the
distance
between
a
candidate
and
all
question
keywords
.
This
model
does
not
work
,
however
,
if
the
question
and
the
answer-bearing
sentences
do
not
match
on
the
surface
.
The
pattern-based
model
[
Ravichandran
et
al.
2002
and
Soubbotin
et
al.
2002
]
first
classifies
the
question
into
predefined
categories
,
and
then
extracts
the
exact
answer
by
using
answer
patterns
learned
off-line
.
Although
the
pattern-based
model
can
obtain
high
precision
for
some
predefined
types
of
questions
,
it
is
difficult
to
define
question
types
in
advance
for
open-domain
question
answering
.
Furthermore
,
this
model
is
not
suitable
for
all
types
of
questions
.
The
deep
NLP-based
model
[
Moldovan
et
al.
2002
;
Hovy
et
al.
2001
;
and
Pasca
et
al.
2001
]
usually
parses
the
user
question
and
an
answer-bearing
sentence
into
a
semantic
representation
,
and
then
semantically
matches
them
to
find
the
answer
.
This
model
has
performed
very
well
at
TREC
workshops
,
but
it
heavily
depends
on
highperformance
NLP
tools
,
which
are
time
consuming
and
labor
intensive
for
many
languages
.
Finally
,
the
machine
learning-based
model
has
also
been
investigated
.
current
models
of
this
type
are
based
on
supervised
approaches
[
Ittycheriah
et
al.
2001
;
Ng
et
al.
2001
;
Suzuki
et
al.
2002
;
and
Sasaki
et
al.
2005
]
that
are
heavily
dependent
on
hand-tagged
question-answer
training
pairs
,
which
not
readily
available
.
In
response
to
this
situation
,
this
paper
presents
the
U-SVM
for
answer
selection
in
open-domain
web
question
answering
system
.
Our
U-SVM
has
the
following
advantages
over
supervised
machine
learning
techniques
.
First
,
the
U-SVM
classifies
questions
into
a
question-dependent
set
of
clusters
,
and
the
answer
is
the
name
of
a
question
cluster
.
In
contrast
,
most
previous
models
have
classified
candidates
into
positive
and
negative
.
Second
,
the
U-SVM
automatically
learns
the
unique
question-dependent
clusters
and
the
pseudo-training
for
each
Table
1
:
Comparison
of
Various
Machine
Learning
Techniques
Key
Idea
Training
Data
Classifying
candidates
into
positive
and
negative
N-C
Model
Selecting
correct
answer
by
aligning
question
with
sentences
ME
Classifier
Classifying
words
in
sentences
into
answer
and
non-answer
words
OurU-SVM
Model
SVM
Classifier
Classifying
question
into
a
set
of
question-dependent
clusters
question
.
This
differs
from
the
supervised
techniques
,
in
which
a
large
number
of
hand-tagged
training
pairs
are
shared
by
all
of
the
test
questions
.
In
addition
,
supervised
techniques
independently
process
the
answer-bearing
sentences
,
so
the
answers
to
the
questions
may
not
always
be
ex-tractable
because
of
algorithmic
limitations
.
On
the
other
hand
,
the
U-SVM
can
use
the
interdependence
between
answer-bearing
sentences
to
select
the
answer
to
a
question
.
Table
1
compares
the
key
idea
and
training
data
used
in
the
U-SVM
with
those
used
in
the
supervised
machine
learning
techniques
.
Here
,
ME
means
the
maximum
entropy
model
,
and
N-C
means
the
noisy-channel
model
.
The
essence
of
the
U-SVM
is
to
regard
answer
selection
as
a
kind
of
text
categorization-like
classification
task
,
but
with
no
training
data
available
.
In
the
U-SVM
,
the
steps
of
"
clustering
web
search
results
"
,
"
classifying
the
question
"
,
and
'
training
SVM
classifier
"
play
very
important
roles
.
3.1
Clustering
Web
Search
Results
Web
search
results
,
such
as
snippets
returned
by
Google
,
usually
include
a
mixture
of
multiple
subtopics
(
called
clusters
in
this
paper
)
related
to
the
user
question
.
To
group
the
web
search
results
into
clusters
,
we
assume
that
the
candidate
answer
in
each
Google
snippet
can
represent
the
"
signature
"
of
its
cluster
.
In
other
words
,
the
Google
snippets
containing
the
same
candidate
are
regarded
as
aligned
snippets
,
and
thus
belong
to
the
same
cluster
.
Web
search
results
are
clustered
in
two
phases
.
If
a
snippet
includes
L
different
candidates
,
the
snippet
belongs
to
L
different
clusters
.
If
the
candidates
of
different
snippets
are
the
same
,
these
snippets
belong
to
the
same
clusters
.
Consequently
,
the
number
of
clusters
{
Ci
}
is
fully
determined
by
the
number
of
candidates
{
ci
}
,
and
the
cluster
name
of
a
cluster
Ci
is
the
candidate
answer
ci
.
Up
to
this
point
,
we
have
obtained
clusters
and
sample
snippets
for
each
cluster
that
will
be
used
as
training
data
for
the
SVM
classifier
.
Because
this
training
data
is
learned
automatically
,
rather
than
hand-tagged
,
we
call
it
pseudo-training
data
.
•
A
second-stage
Google
search
(
SGS
)
is
applied
to
resolve
data
sparseness
in
the
pseudotraining
samples
learned
through
the
FGS
.
The
FGS
data
may
have
very
few
training
snippets
in
some
clusters
,
so
more
snippets
must
be
collected
.
Note
that
this
step
just
learns
new
Google
snippets
into
the
clusters
learned
by
the
FGS
,
but
does
not
add
new
clusters
.
For
each
candidate
answer
ci
:
{
q
,
ci
}
.
Submit
q
/
to
Google
and
download
the
top
50
Google
snippets
.
Retain
the
snippets
containing
the
candidate
ci
and
at
least
one
keyword
qi
.
Group
the
retained
snippets
into
n
clusters
to
form
the
new
pseudo-training
data
.
End_
Here
,
we
give
an
example
illustrating
the
principle
of
clustering
web
search
results
in
the
FGS
.
In
submitting
TREC
2004
test
question
1.1
"
when
was
the
first
Crip
gang
started
?
"
to
Google
(
http
:
/
/
www.google.com
/
apis
)
,
we
extract
n
(
=
8
)
different
candidates
from
the
top
m
(
=
30
)
Google
snippets
.
The
Google
snippets
containing
the
same
candidates
are
aligned
snippets
,
and
thus
the
12
retained
snippets
are
grouped
into
8
clusters
,
as
listed
in
Table
2
.
This
table
roughly
indicates
that
the
snippets
with
the
same
candidate
answers
contain
the
same
sub-meanings
,
so
these
snippets
are
considered
as
aligned
snippets
.
For
example
,
all
Google
snippets
that
contain
the
candidate
answer
1969
express
the
time
of
establishment
of
"
the
first
Crip
gang
"
.
In
summary
,
the
U-SVM
uses
the
result
of
"
clustering
web
search
results
"
as
the
pseudo-training
data
of
the
SVM
classifier
,
and
then
classifies
user
question
into
one
of
the
clusters
for
answer
selection
.
On
the
one
hand
,
the
clusters
and
their
names
are
based
on
candidate
answers
to
question
;
on
the
other
hand
,
candidates
are
dependent
on
question
.
Therefore
,
the
clusters
are
question-dependent
.
3.2
Classifying
Question
Using
the
pseudo-training
data
obtained
by
clustering
web
search
results
to
train
the
SVM
classifier
,
we
classify
user
questions
into
a
set
of
question-dependent
clusters
and
assume
that
the
correct
answer
is
the
name
of
the
question
cluster
that
is
assigned
by
the
trained
U-SVM
classifier
.
For
the
above
example
,
if
the
U-SVM
classifier
,
trained
on
the
pseudo-training
data
listed
in
Table
2
,
classifies
the
above
test
question
into
a
cluster
whose
name
is
1969
,
then
the
cluster
name
1969
is
the
answer
to
the
question
.
This
paper
selects
LIBSVM
toolkit1
to
implement
the
SVM
classifier
.
The
kernel
is
the
radical
basis
function
with
the
parameter
7
=
0.001
in
the
experiments
.
3.3
Feature
Extraction
To
classify
the
question
into
a
question-dependent
set
of
clusters
,
the
U-SVM
classifier
extracts
three
types
of
features
.
•
A
similarity-based
feature
set
(
SBFS
)
is
extracted
from
the
Google
snippets
.
The
SBFS
attempts
to
capture
the
word
overlap
between
a
question
and
a
snippet
.
The
possible
values
range
from
0
to
1
.
SBFS
Features
percentage
of
matched
keywords
(
KWs
)
percentage
of
mismatched
KWs
percentage
of
matched
bi-grams
of
KWs
percentage
of
matched
thesauruses
normalized
distance
between
candidate
and
To
compute
the
matched
thesaurus
feature
,
we
adopt
TONGYICICILIN
2
in
the
experiments
.
•
A
Boolean
match-based
feature
set
(
BMFS
)
is
also
extracted
from
the
Google
snippets
.
The
BMFS
attempts
to
capture
the
specific
keyword
Boolean
matches
between
a
question
and
a
snippet
.
The
possible
values
are
true
or
false
.
BMFS
Features
person
names
are
matched
or
not
location
names
are
matched
or
not
organization
names
are
matched
or
not
time
words
are
matched
or
not
number
words
are
matched
or
not
root
verb
is
matched
or
not
candidate
has
or
does
not
have
bi-gram
in
snippet
matching
bi-gram
in
question
candidate
has
or
does
not
have
desired
named
entity
type
•
A
window-based
word
feature
set
(
WWFS
)
is
a
set
of
words
consisting
of
the
words
Table
2
:
Clustering
Web
Search
Results
Cluster
Name
Google
Snippet
It
is
believed
that
the
first
Crip
gang
was
formed
in
late
1969
.
During
this
time
in
Los
Angeles
there
were
.
.
.
.
.
.
the
first
Bloods
and
Crips
gangs
started
forming
in
Los
Angeles
in
late
1969
,
the
Island
Bloods
sprung
up
in
north
Pomona
.
.
.
2004
main
1
Crips
1.1
FACTOID
When
was
the
first
Crip
gang
started
?
1.2
FACTOID
What
does
the
name
mean
or
come
.
.
.
One
of
the
first-known
and
publicized
killings
by
Crip
gang
members
occurred
at
the
Hollywood
Bowl
in
March
1972
.
Williams
joined
Washington
in
1971
,
forming
the
westside
faction
of
what
had
come
to
be
called
the
Crips
.
The
Crips
gang
formed
as
a
kind
of
community
watchdog
group
in
1971
after
the
demise
of
the
Black
Panthers
.
.
.
.
.
.
.
formed
by
16
year
old
Raymond
Lee
Washington
in
1969
.
Williams
joined
Washington
in
1971
.
.
.
had
come
to
be
called
the
Crips
.
It
was
initially
started
to
eliminate
all
street
gangs
.
.
.
Oceanside
police
first
started
documenting
gangs
in
1982
,
when
five
known
gangs
were
operating
in
the
city
:
the
Posole
Locos
.
.
.
Street
Locos
;
Deep
Valley
Bloods
and
Deep
Valley
Crips
.
By
the
mid-1990s
,
gang
violence
had
.
.
.
The
Blood
gangs
started
up
as
opposition
to
the
Crips
gangs
,
also
in
the
1970s
,
and
the
rivalry
stands
to
this
day
.
.
.
[
wi+1
,
.
.
.
,
Wi+5
}
the
candidate
answer
.
The
WWFS
features
can
be
regarded
as
a
kind
of
relevant
snippets-based
question
keywords
expansion
.
By
extracting
the
WWFS
feature
set
,
the
feature
space
in
the
U-SVM
becomes
question
dependent
,
which
may
be
more
suitable
for
classifying
the
question
.
The
number
of
classification
features
in
the
S-SVM
must
be
fixed
,
however
,
because
all
questions
share
the
same
training
data
.
This
is
one
difference
between
the
U-SVM
and
the
supervised
SVM
classifier
for
answer
selection
.
Each
word
feature
in
the
WWFS
is
weighted
using
its
ISF
value
.
snippets
containing
word
feature
Wj
,
and
N
(
wj
,
Ci
)
is
the
number
of
snippets
in
cluster
Ci
containing
word
feature
Wj.
When
constructing
question
vector
,
we
assume
that
the
question
is
an
ideal
question
that
contains
all
the
extracted
WWFS
words
.
Therefore
,
the
values
of
the
WWFS
word
features
in
question
vector
are
1
.
Similarly
,
the
values
of
the
SBFS
and
BMFS
features
in
question
vector
are
also
estimated
by
self-similarity
calculation
.
For
the
experiments
,
no
English
named
entity
recognition
(
NER
)
tool
is
in
our
hand
at
the
time
of
the
experiments
;
therefore
,
we
validate
the
U-SVM
in
terms
of
Chinese
web
QA
using
three
test
data
sets
,
which
will
be
published
with
this
paper3
.
Although
the
U-SVM
is
independent
of
the
question
types
,
for
convenience
in
candidate
extraction
,
only
those
questions
whose
answers
are
named
entities
are
selected
.
The
three
test
data
sets
are
CTREC04
,
CTREC05
and
CTEST05
.
CTREC04
is
a
set
of
178
Chinese
questions
translated
from
TREC
2004
FACTOID
testing
questions
.
CTREC05
is
a
set
of
279
Chinese
questions
translated
from
TREC
2005
FACTOID
testing
questions
.
CTEST05
is
a
set
of
178
Chinese
questions
found
in
[
Wu
et
al.
2004
]
that
are
similar
to
TREC
testing
questions
except
that
they
are
written
in
Chinese
.
Figure
2
breaks
down
the
types
of
questions
(
manually
assigned
)
in
the
CTREC04
and
CTREC05
data
sets
.
Here
,
PER
,
LOC
,
ORG
,
TIM
,
NUM
,
and
CR
refer
to
questions
whose
answers
are
a
person
,
location
,
organization
,
time
,
number
,
and
book
or
movie
,
respectively
.
To
collect
the
question-answer
training
data
for
the
S-SVM
,
we
submitted
807
Chinese
questions
to
Google
and
extracted
the
candidates
for
each
question
from
the
top
50
Google
snippets
.
We
then
manually
selected
the
snippets
containing
the
correct
answers
as
positive
snippets
,
and
designated
all
of
the
other
snippets
as
negative
snippets
.
Finally
,
we
collected
807
hand-tagged
Chinese
question-answer
pairs
as
the
training
data
of
S-SVM
called
CTRAIN-DATA
.
4.2
Evaluation
Method
In
the
experiments
,
the
top
m
(
=
50
)
Google
snippets
are
adopted
to
extract
candidates
by
using
a
3
Currently
no
public
testing
question
set
for
simplified
Chinese
QA
is
available
.
Chinese
NER
tool
[
Wu
et
al.
2005
]
.
The
number
of
the
candidates
extracted
from
the
top
m
(
=
50
)
snippets
,
n
,
is
adaptive
for
different
questions
but
it
does
not
exceed
30
.
The
results
are
evaluated
in
terms
of
two
scores
,
topjn
and
mrr_5
.
Here
,
topjn
is
the
rate
at
which
at
least
one
correct
answer
is
included
in
the
top
n
answers
,
while
mrr_5
is
the
average
reciprocal
rank
(
1
/
n
)
of
the
highest
rank
n
(
n
&lt;
5
)
of
a
correct
answer
to
each
question
.
The
Retrieval-M
selects
the
candidate
with
the
shortest
distances
to
all
question
keywords
as
the
correct
answer
.
In
this
experiment
,
the
Retrieval-M
is
implemented
based
on
the
snippets
returned
by
Google
,
while
the
U-SVM
is
based
on
the
SGS
data
,
the
SBFS
and
BMFS
feature
.
Table
3
summarizes
the
comparative
performance
.
Table
3
:
Comparison
of
Retrieval-M
and
U-SVM
Retrieval-M
The
table
shows
that
the
U-SVM
greatly
improves
the
performance
of
the
Retrieval-M
:
the
top
A
improvements
for
CTREC04
and
CTREC05
are
about
25.8
%
and
16.0
%
,
respectively
.
This
experiment
demonstrates
that
the
assumptions
used
here
in
clustering
web
search
results
and
in
classifying
the
question
are
effective
in
many
cases
,
and
that
the
U-SVM
benefits
from
these
assumptions
.
To
explore
the
effectiveness
of
our
unsupervised
model
as
compared
with
the
supervised
model
,
we
conduct
a
cross-model
comparison
of
the
S-SVM
and
the
U-SVM
with
the
SBFS
and
BMFS
feature
sets
.
The
U-SVM
results
are
compared
with
the
S
-
on
CTRAINDATA
.
These
tables
show
the
following
:
•
The
proposed
U-SVM
significantly
outperforms
the
S-SVM
for
all
measurements
and
all
test
data
sets
.
For
the
CTREC04
test
data
set
,
the
top1
improvements
for
the
FGS
and
SGS
data
are
about
14.5
%
and
14.4
%
,
respectively
.
For
the
CTREC05
test
data
set
,
the
top1
score
for
the
FGS
data
increases
from
30.0
%
to
48.0
%
,
and
the
top
A
score
for
the
SGS
data
increases
from
33.3
%
to
50.0
%
.
Note
that
the
SBFS
and
BMFS
features
here
is
fewer
than
the
features
in
[
Ittycheriah
et
al.
2001
;
Suzuki
et
al.
2002
]
,
but
the
comparison
is
still
effective
because
the
models
are
compared
in
terms
of
the
same
features
.
In
the
S-SVM
,
all
questions
share
the
same
training
data
,
while
the
U-SVM
uses
the
unique
pseudo-training
data
for
each
question
.
This
is
the
main
reason
why
the
U-SVM
performs
better
than
the
S-SVM
does
.
•
The
SGS
data
is
greatly
helpful
for
both
the
U-SVM
and
the
S-SVM
.
Compared
with
sons
for
this
improvement
are
:
the
data
sparse-ness
in
FGS
data
is
partially
resolved
;
and
the
use
of
the
Web
to
introduce
data
redundancy
is
helpful
.
[
Clarke
et
al.
2001
;
Magnini
et
al.
2002
;
and
Dumais
et
al.
2002
]
.
In
the
S-SVM
,
all
of
the
test
questions
share
the
same
hand-tagged
training
data
,
so
the
WWFS
features
cannot
be
easily
used
[
Ittycheriah
et
al.
2002
;
Suzuki
,
et
al.
2002
]
.
Tables
6
and
7
compare
the
performances
of
the
U-SVM
with
the
(
SBFS
+
BMFS
)
features
,
the
WWFS
features
,
and
combination
of
three
types
of
features
for
the
CTREC04
and
CTREC05
test
data
sets
,
respectively
.
Table
6
:
Performances
of
U-SVM
for
Different
Features
on
CTREC04_
Table
7
:
Performances
of
U-SVM
for
Different
Fea
-
SBFS+BMFS
Combination
These
tables
report
that
combining
three
types
of
features
can
improve
the
performance
of
the
U-SVM
.
Using
a
combination
of
features
with
the
CTREC04
test
data
set
results
in
the
best
performances
:
60.82
%
/
71.31
%
/
88.66
%
for
topA
/
mrr-h
/
top-h
.
Similarly
,
as
compared
with
using
the
(
SBFS
+
BMFS
)
and
WWFS
features
,
the
improvements
from
using
a
combination
of
features
results
also
demonstrate
that
the
(
SBFS
+
BMFS
)
features
are
more
important
than
the
WWFS
features
.
These
comparative
experiments
indicate
that
the
U-SVM
performs
better
than
the
S-SVM
does
,
even
though
the
U-SVM
is
an
unsupervised
technique
and
no
hand-tagged
training
data
is
provided
.
The
aver
-
age
topA
improvements
for
both
test
data
sets
are
both
more
than
20
%
.
and
thus
degrades
the
performance
of
the
Pattern-M
model
.
To
compare
the
U-SVM
with
the
Pattern-M
and
in
Figure
3
.
The
CTEST05
includes
14
different
question
types
,
for
example
,
Inventor_Stuff
(
with
question
like
"
Who
invented
telephone
?
"
)
,
Event-Day
(
with
question
like
"
when
is
World
Day
for
Water
?
"
)
,
and
so
on
.
The
Pattern-M
uses
the
dependency
syntactic
answer
patterns
learned
in
[
Wu
et
al.
2007
]
to
extract
the
answer
,
and
named
entities
are
also
used
to
filter
noise
from
the
candidates
.
Table
8
summarizes
the
performances
of
the
U-SVM
,
Pattern-M
,
and
S-SVM
models
on
CTEST05
.
Pattern-M
•
The
Chinese
dependency
parser
influences
dependency
syntactic
answer-pattern
extraction
,
•
The
imperfection
of
Google
snippets
affects
pattern
matching
,
and
thus
adversely
influences
the
Pattern-M
model
.
From
the
cross-model
comparison
,
we
conclude
that
the
performance
ranking
of
these
models
is
:
U-SVM
&gt;
Pattern-M
&gt;
S-SVM
&gt;
Retrieval-M
.
5
Conclusion
and
Future
Work
This
paper
presents
an
unsupervised
machine
learning
technique
(
called
the
U-SVM
)
for
answer
selection
that
is
validated
in
Chinese
open-domain
web
QA
.
Regarding
answer
selection
as
a
kind
of
classification
task
,
the
U-SVM
automatically
learns
clusters
and
pseudo-training
data
for
each
cluster
by
clustering
web
search
results
.
It
then
selects
the
correct
answer
from
the
candidates
according
to
classifying
the
question
.
The
contribution
of
this
paper
is
that
it
presents
an
unsupervised
machine
learning
technique
for
web
QA
that
starts
with
only
a
user
question
.
The
results
of
our
experiments
with
three
test
data
sets
are
encouraging
.
As
compared
with
the
S-SVM
,
the
topA
performances
of
the
U-SVM
for
the
CTREC04
and
CTREC05
data
sets
are
significantly
improved
,
at
more
than
20
%
.
Moreover
,
the
U-SVM
performs
better
than
the
Retrieval-M
and
the
Pattern-M
.
FACTOID
test
questions
.
In
fact
,
our
technique
is
independent
of
question
types
only
if
the
candidates
can
be
extracted
.
In
the
future
,
we
will
explore
the
effectiveness
of
our
technique
for
the
other
types
of
questions
.
The
web
search
results
clustering
in
the
U-SVM
defines
that
a
candidate
in
a
Google
snippet
can
represent
the
"
signature
"
of
its
cluster
.
This
definition
,
however
,
is
not
always
effective
.
To
filter
noise
in
the
pseudo-training
data
,
we
will
extract
relations
between
the
candidates
and
the
keywords
as
the
cluster
signatures
of
Google
snippets
.
Moreover
,
applying
the
U-SVM
to
QA
systems
in
other
languages
,
like
English
and
Japanese
,
will
also
be
included
in
our
future
work
.
