The
technology
of
opinion
extraction
allows
users
to
retrieve
and
analyze
people
's
opinions
scattered
over
Web
documents
.
We
define
an
opinion
unit
as
a
quadruple
consisting
of
the
opinion
holder
,
the
subject
being
evaluated
,
the
part
or
the
attribute
in
which
the
subject
is
evaluated
,
and
the
value
of
the
evaluation
that
expresses
a
positive
or
negative
assessment
.
We
use
this
definition
as
the
basis
for
our
opinion
extraction
task
.
We
focus
on
two
important
subtasks
of
opinion
extraction
:
(
a
)
extracting
aspect-evaluation
relations
,
and
(
b
)
extracting
aspect-of
relations
,
and
we
approach
each
task
using
methods
which
combine
contextual
and
statistical
clues
.
Our
experiments
on
Japanese
weblog
posts
show
that
the
use
of
contextual
clues
improve
the
performance
for
both
tasks
.
1
Introduction
The
explosive
increase
in
Web
communication
has
attracted
increasing
interest
in
technologies
for
automatically
mining
personal
opinions
from
Web
documents
such
as
product
reviews
and
weblogs
.
Such
technologies
would
benefit
users
who
seek
reviews
on
certain
consumer
products
of
interest
.
Previous
approaches
to
the
task
of
mining
a
large-scale
document
collection
of
customer
opinions
(
or
Aspect-of
Relations
in
Opinion
Mining
reviews
)
can
be
classified
into
two
approaches
:
Document
classification
and
information
extraction
.
The
former
is
the
task
of
classifying
documents
or
passages
according
to
their
semantic
orientation
such
as
positive
vs.
negative
.
This
direction
has
been
forming
the
mainstream
of
research
on
opinion-sensitive
text
processing
(
Pang
et
al.
,
2002
;
Turney
,
2002
,
etc.
)
.
The
latter
,
on
the
other
hand
,
focuses
on
the
task
of
extracting
opinions
consisting
of
information
about
,
for
example
,
???who
feels
how
about
which
aspect
of
what
product???
from
unstructured
text
data
.
In
this
paper
,
we
refer
to
this
information
extraction-oriented
task
as
opinion
extraction
.
In
contrast
to
sentiment
classification
,
opinion
extraction
aims
at
producing
richer
information
and
requires
an
in-depth
analysis
of
opinions
,
which
has
only
recently
been
attempted
by
a
growing
but
still
relatively
small
research
community
(
Yi
et
al.
,
2003
;
Hu
and
Liu
,
2004
;
Popescu
and
Etzioni
,
2005
,
etc.
)
.
Most
previous
work
on
customer
opinion
extraction
assumes
the
source
of
information
to
be
customer
reviews
collected
from
customer
review
sites
(
Popescu
and
Etzioni
,
2005
;
Hu
and
Liu
,
2004
;
Liu
et
al.
,
2005
)
.
In
contrast
,
in
this
paper
,
we
consider
the
task
of
extracting
customer
opinions
from
unstructured
weblog
posts
.
Compared
with
extraction
from
review
articles
,
extraction
from
weblogs
is
more
challenging
because
weblog
posts
tend
to
exhibit
greater
diversity
in
topics
,
goals
,
vocabulary
,
style
,
etc.
and
are
much
more
likely
to
include
descriptions
irrelevant
to
the
subject
in
question
.
In
this
paper
,
we
first
describe
our
task
setting
of
opinion
extraction
.
We
conducted
a
corpus
study
and
investigated
the
feasibility
of
the
task
definition
by
showing
the
statistics
and
inter-annotator
agreement
of
our
corpus
annotation
.
Next
,
we
show
that
the
crucial
body
of
the
above
opinion
extraction
task
can
be
decomposed
into
two
kinds
of
relation
extraction
,
i.e.
aspect-evaluation
relation
extraction
and
aspect-of
relation
extraction
.
For
example
,
the
passage
"
I
went
out
for
lunch
at
the
Deli
and
ordered
a
curry
with
chicken
.
It
was
pretty
good
"
has
an
aspect-evaluation
relation
???curry
with
chicken
,
was
good???
and
an
aspect-of
relation
???The
Deli
,
curry
with
the
chicken???
.
The
former
task
can
be
regarded
as
a
special
type
of
predicate-argument
structure
analysis
or
semantic
role
labeling
.
The
latter
,
on
the
other
hand
,
can
be
regarded
as
bridging
reference
resolution
(
Clark
,
1977
)
,
which
is
the
task
of
identifying
relations
between
definite
noun
phrases
and
discourse-new
entities
implicitly
related
to
some
previously
mentioned
entities
.
Most
of
the
previous
work
on
customer
opinion
extraction
,
however
,
does
not
adopt
the
state-of-the-art
techniques
in
those
fields
,
relying
only
on
simple
proximity-based
or
pattern-based
methods
.
In
this
context
,
this
paper
empirically
shows
that
incorporating
machine
learning-based
techniques
devised
for
predicate-argument
structure
analysis
and
bridging
reference
resolution
improve
the
performance
of
both
aspect-evaluation
and
aspect-of
relation
extraction
.
Furthermore
,
we
also
show
that
combining
contextual
clues
with
a
common
co-occurrence
statistics-based
technique
for
bridging
reference
resolution
makes
a
significant
improvement
on
aspect-of
relation
extraction
.
2
Opinion
extraction
:
Task
design
Our
present
goal
is
to
build
a
computational
model
to
extract
opinions
from
Web
documents
in
such
a
form
as
:
Who
feels
how
on
which
aspects
of
which
subjects
.
Given
the
passage
presented
in
Figure
1
,
for
example
,
the
opinion
we
want
to
extract
is
:
"
the
writer
feels
that
the
colors
of
pictures
taken
with
Powershot
(
product
)
are
beautiful
.
"
As
suggested
by
this
example
,
we
consider
it
reasonable
to
start
with
an
assumption
that
most
evaluative
opinions
can
be
structured
as
a
frame
composed
of
the
following
constituents
:
Opinion
holder
The
person
who
is
making
an
evaluation
.
An
opinion
holder
is
typically
the
first
person
(
the
author
)
.
is
unspecified
if
the
rumor
.
Subject
A
named
entity
(
product
or
company
)
of
a
given
particular
class
of
interest
(
e.g.
a
car
model
name
in
the
automobile
domain
)
.
Aspect
A
part
,
member
or
related
object
,
or
an
attribute
(
of
a
part
)
of
the
subject
on
which
the
evaluation
is
made
(
engine
,
size
,
etc.
)
Evaluation
An
evaluative
or
subjective
phrase
used
to
express
an
evaluation
or
the
opinion
holder
's
mental
/
emotional
attitude
(
good
,
poor
,
powerful
,
stylish
,
(
I
)
like
,
(
I
)
am
satisfied
,
etc.
)
We
say
the
opinion
holder
opinion
is
mentioned
as
a
According
to
this
typology
,
the
example
in
Figure
1
has
six
constituents
,
the
writer
(
opinion
holder
)
,
Powershot
(
subject
)
,
pictures
(
aspect
)
,
colors
(
as-pect
)
,
beautiful
(
evaluation
)
,
easy
to
grip
(
evalua-tion
)
,
and
constitute
two
units
of
opinions
as
presented
in
the
right
half
of
the
figure
.
We
call
such
a
unit
an
opinion
unit
.
In
this
paper
,
we
only
consider
explicitly
mentioned
evaluative
opinions
as
our
targets
of
extraction
,
excluding
opinions
indirectly
expressed
through
,
for
example
,
style
or
language
choice
from
our
scope
.
Under
this
assumption
,
opinion
extraction
can
be
defined
as
a
task
of
filling
a
fixed
number
of
slots
as
above
for
each
of
the
evaluations
expressed
in
a
given
text
collection
.
Two
issues
then
immediately
arise
.
First
,
it
is
necessary
to
make
sure
that
the
definition
of
the
opinion
units
is
clear
enough
for
human
annotators
to
be
able
to
carry
out
the
task
with
sufficient
accuracy
.
Second
,
all
the
slots
might
not
consist
of
simple
expressions
in
that
the
filler
of
an
aspect
slot
may
have
a
hierarchical
structure
in
itself
.
For
example
,
"
the
leather
cover
of
the
seats
(
of
a
car
)
"
refers
to
a
part
of
a
part
of
a
car
.
In
theory
,
such
a
hierarchical
chain
can
be
of
any
length
,
which
may
affect
the
feasibility
of
the
task
.
For
tackling
these
issues
,
we
built
a
corpus
annotated
with
the
above
sort
of
information
and
investigated
the
feasibility
of
the
task
.
2.1
Corpus
study
We
first
collected
116
Japanese
weblog
posts
in
the
restaurant
domain
by
randomly
sampling
from
a
collection
of
posts
classified
under
the
"
gourmet
"
category
on
a
major
blog
site
:
.
We
asked
two
annotators
to
annotate
them
independently
of
each
other
following
the
above
specification
.
The
annotators
first
identified
evaluative
phrases
,
and
then
for
each
evaluative
phrase
judged
whether
it
was
concerning
a
particular
subject
(
i.e.
a
restaurant
)
in
the
given
domain
.
If
judged
yes
,
the
annotators
filled
the
opinion
holder
and
subject
slots
obligatorily
.
The
annotators
filled
the
aspect
slot
only
when
its
filler
appeared
in
the
document
and
identified
the
hierarchical
relations
between
aspects
if
any
(
e.g.
noodle
and
its
volume
)
.
Note
that
,
if
a
sentence
has
two
or
more
evaluations
,
they
have
to
make
one
opinion
unit
for
each
.
2.1.1
Inter-annotator
agreement
We
investigated
the
degree
of
inter-annotator
agreement
.
In
the
task
of
identifying
evaluations
,
one
annotator
A
identified
450
evaluations
while
the
other
A
identified
392
,
and
329
cases
of
them
coincided
.
The
two
annotators
did
not
identify
the
same
number
of
evaluations
,
so
instead
of
using
kappa
statistics
,
we
use
the
following
metric
for
measuring
agreement
as
Wiebe
et
al.
(
2005
)
do
:
1
2
The
F1
measure
of
the
agreement
between
the
two
was
therefore
0.79
,
which
indicate
that
humans
can
identify
evaluation
at
a
reasonable
level
.
Next
,
we
investigated
the
inter-annotator
agreement
of
the
aspect-evaluation
and
subject-evaluation
relations
.
Annotator
A
identified
328
relations
,
and
A
identified
346
relations
.
This
shows
that
we
obtained
high
consistency
.
articles
sentences
opinion
units
Asp-Eval
Asp-Asp
Subj-Asp
Subj-Eval
Subj-Asp-Eval
Subj-Asp-Asp-Eval
other
holder
#
of
and
agr
(
A
|
|
A
)
was
0.80
(
F1
measure
was
0.79
)
,
which
show
that
the
human
annotators
can
carry
out
the
task
at
a
reasonable
accuracy
.
Based
on
this
corpus
study
,
we
believe
that
our
definitions
of
two
relations
are
clear
enough
for
constructing
annotated
corpus
.
2.1.2
Opinion-annotated
corpus
Based
on
these
results
,
we
collected
a
larger
set
of
weblog
posts
in
four
domains
:
restaurant
,
automobile
,
cellular
phone
and
video
game
.
We
then
asked
annotator
A
to
annotate
them
in
the
same
annotation
scheme
as
above
.
The
results
are
summarized
in
Table
1.
I
in
the
table
shows
the
number
of
the
identified
opinion
units
and
relations
,
and
II
shows
the
number
of
hierarchical
chains
of
aspects
.
For
example
,
"
Nokia
6800
has
a
nice
color
screen
"
is
counted
1
as
"
Subj-Asp-Eval
"
since
this
example
includes
a
subject
"
Nokia
6800
"
,
an
aspect
"
color
screen
"
and
an
evaluation
"
nice
"
.
"
Other
"
indicates
the
number
of
the
case
where
the
length
of
hierarchical
chains
of
aspects
is
three
or
more
.
One
observation
is
that
,
for
all
the
domains
,
90
%
of
all
the
opinion
units
have
a
hierarchical
chain
of
aspects
whose
length
is
two
or
less
.
From
this
,
we
can
conclude
that
hierarchical
chains
longer
than
two
are
rare
,
and
the
problem
is
not
so
complicated
,
though
they
can
be
of
any
length
in
theory
.
The
row
of
"
Non-writer
op
(
inion
)
holder
"
at
the
bottom
of
Table
1
shows
the
number
of
opinion
units
whose
opinion
holder
is
not
the
writer
of
the
weblog
.
This
result
indicates
that
when
an
evaluative
expression
is
found
,
its
opinion
holder
is
highly
likely
to
be
the
writer
of
the
blogs
.
Therefore
,
we
put
aside
the
task
of
filling
the
opinion
holder
slot
in
this
paper
.
2.2
Related
work
on
extraction
task
settings
of
opinion
There
are
several
researches
on
customer
opinion
extraction
.
Hu
and
Liu
(
2004
)
considered
the
task
of
extracting
???Aspect
,
Sentence
,
Semantic-orientation???
triples
in
our
terminology
,
where
Sentence
is
the
one
that
includes
the
Aspect
,
and
Semantic-orientation
is
either
positive
or
negative
.
For
example
,
our
previous
paper
(
Kobayashi
et
al.
,
2005
)
addresses
the
task
of
extracting
???Subject
,
Aspect
,
Evaluation???
.
However
,
none
of
those
papers
reports
on
such
an
extensive
corpus
study
as
what
we
report
in
this
paper
.
In
addition
,
in
this
paper
,
we
consider
not
only
aspect-evaluation
relations
but
also
hierarchical
chains
of
subject-aspect
and
aspect-aspect
relations
,
which
has
never
been
addressed
in
previous
work
.
Open-domain
opinion
extraction
is
another
trend
of
research
on
opinion
extraction
,
which
aims
to
extract
a
wider
range
of
opinions
from
such
texts
as
newspaper
articles
(
Yu
and
Hatzivassiloglou
,
2003
;
Kim
and
Hovy
,
2004
;
Wiebe
et
al.
,
2005
;
Choi
et
al.
,
2006
)
.
To
the
best
of
our
knowledge
,
one
of
the
most
extensive
corpus
studies
in
this
field
has
been
conducted
in
the
MPQA
project
(
Wiebe
et
al.
,
2005
)
;
while
their
concerns
include
the
types
of
opinions
we
consider
,
they
annotate
newspaper
articles
,
which
presumably
exhibit
considerably
different
characteristics
from
customer-generated
texts
.
Though
we
do
not
discuss
the
problem
of
determining
semantic
orientation
,
we
assume
availability
of
state-of-the-art
methods
that
perform
this
task
(
Suzuki
et
al.
,
2006
;
Takamura
et
al.
,
2006
,
etc.
)
.
The
problem
of
determining
semantic
orientation
will
be
solved
by
using
these
techniques
,
so
we
focus
on
the
main
issue
:
Extracting
opinion
units
from
given
texts
.
3
Method
for
opinion
extraction
Before
designing
a
model
for
our
opinion
extraction
task
,
it
is
important
to
note
that
aspect
phrases
are
open-class
expressions
and
tend
to
be
heavily
domain-dependent
.
In
fact
,
according
to
our
investigation
on
our
opinion-annotated
corpus
,
the
number
of
aspect
types
is
nearly
3,200
,
and
only
3%
of
them
appear
in
two
or
more
domains
as
shown
in
Figure
2.
For
evaluation
expressions
,
on
the
other
hand
,
the
number
of
types
is
much
smaller
than
that
of
aspect
expressions
,
and
27%
of
them
appear
in
multiple
domains
.
This
indicates
that
evaluation
expressions
are
more
likely
to
be
used
commonly
across
different
domains
compared
with
aspects
.
To
prove
this
assumption
,
we
created
a
dictionary
of
evaluation
expressions
from
customer
reviews
of
automobiles
(
230,000
sentences
in
total
)
using
the
semi-automatic
method
proposed
by
Kobayashi
et
al.
(
2004
)
.
We
expanded
the
dictionary
by
hand
with
external
resources
including
publicly
available
ordinal
thesauri
.
As
a
result
,
we
collected
5,550
entries
.
According
to
our
investigation
of
the
coverage
by
the
dictionary
,
0.84
(
restaurant
)
,
0.88
(
cellular
phone
)
,
0.91
(
automobile
)
,
and
0.93
(
video
game
)
of
the
evaluations
annotated
in
our
corpus
are
covered
by
the
dictionary
.
From
this
observation
,
we
consider
that
it
is
reasonable
to
start
opinion
extraction
with
the
identification
of
evaluation
expressions
.
We
therefore
design
the
process
of
extracting
???Subject
,
Aspect
,
Evaluation???
as
follows
:
1.
Aspect-evaluation
relation
extraction
:
For
each
of
the
candidate
evaluations
that
are
selected
from
a
given
document
by
dictionary
look-up
,
identify
the
target
of
the
evaluation
.
Here
the
identified
target
may
be
a
subject
(
e.g.
IXY
(
is
well-designed
)
)
or
an
aspect
of
a
subject
(
e.g.
the
quality
(
is
amazing
)
)
.
Hereafter
,
we
use
the
term
aspect
to
refer
to
both
an
aspect
and
a
subject
itself
,
since
the
subject
can
be
regarded
as
the
top
element
in
the
hierarchical
chain
of
aspects
.
Opinion-hood
determination
:
Judge
whether
or
not
the
obtained
pair
???aspect
,
evaluation???
is
an
expression
of
an
opinion
by
considering
the
given
context
.
If
it
is
judged
yes
,
go
to
step3
;
otherwise
,
return
to
step
1
with
a
new
candidate
2.
evaluation
expression
.
Aspect-of
relation
extraction
:
If
the
identified
aspect
is
not
a
subject
,
search
for
its
antecedent
,
i.e.
an
expression
that
is
a
higher
aspect
or
a
subject
of
the
current
aspect
.
Repeat
step
3
until
reaching
a
subject
or
no
parent
is
found
.
3.1
Related
work
on
opinion
extraction
A
common
approach
to
the
customer
opinion
extraction
task
mainly
uses
simple
proximity
-
or
pattern-based
techniques
.
For
example
,
Tateishi
et
al.
(
2004
)
implement
five
syntactic
patterns
and
Popescu
et
al.
(
2005
)
use
ten
syntactic
patterns
.
Such
an
approach
is
limited
in
two
respects
.
First
,
it
assumes
the
availability
of
a
list
of
potential
aspect
expressions
as
well
as
evaluation
expressions
;
however
creating
such
a
list
of
aspects
for
a
variety
of
domains
can
be
prohibitively
expensive
because
of
the
domain
dependency
of
aspect
expressions
.
In
contrast
,
our
method
does
not
require
any
aspect
lexicon
.
Second
,
their
approach
lacks
the
perspective
of
viewing
aspect-evaluation
extraction
as
a
specific
type
of
predicate-argument
structure
analysis
,
i.e.
the
task
of
identifying
the
arguments
of
a
given
predicate
in
a
given
text
,
and
fails
to
benefit
from
the
state-of-the-art
techniques
of
this
rapidly
growing
field
.
The
syntactic
patterns
used
in
their
research
are
analyzed
by
a
dependency
parser
,
however
,
aspect-evaluation
relations
appear
in
diverse
syntactic
patterns
,
which
cannot
be
easily
captured
by
a
handful
of
manually
devised
rules
.
An
exception
is
the
model
reported
by
Kanayama
et
al.
(
2004
)
,
which
uses
a
component
of
an
existing
MT
system
to
identify
the
"
aspect
"
argument
of
a
given
"
evaluation
"
predicate
.
However
,
the
MT
component
they
use
is
not
publicly
available
,
and
even
if
it
were
,
it
would
be
difficult
to
apply
it
to
tasks
in
hand
due
of
the
opaqueness
of
its
mechanism
.
Our
approach
aims
to
develop
a
more
generally
applicable
model
of
aspect-evaluation
extraction
.
In
open-domain
opinion
extraction
,
some
approaches
use
syntactic
features
obtained
from
parsed
input
sentences
(
Choi
et
al.
,
2006
;
Kim
and
Hovy
,
2006
)
,
as
is
commonly
done
in
semantic
role
labeling
.
Choi
et
al.
(
2006
)
address
the
task
of
extracting
opinion
entities
and
their
relations
,
and
incorporate
syntactic
features
to
their
relation
extraction
model
.
Kim
and
Hovy
(
2006
)
proposed
a
method
for
extracting
opinion
holders
,
topics
and
opinion
words
,
in
which
they
use
semantic
role
labeling
as
an
intermediate
step
to
label
opinion
holders
and
topics
.
However
,
these
approaches
do
not
address
the
task
of
extracting
aspect-of
relations
and
make
use
of
syntactic
features
only
for
labeling
opinion
holders
and
topics
.
In
contrast
,
as
we
describe
below
,
we
find
the
significant
overlap
between
aspect-evaluation
relation
extraction
and
aspect-of
relation
extraction
and
apply
the
same
approach
to
both
tasks
,
gaining
the
generality
of
the
model
.
Aspect-of
relations
can
be
regarded
as
a
sub-type
of
bridging
reference
(
Clark
,
1977
)
,
which
is
a
common
linguistic
phenomenon
where
the
referent
of
a
definite
noun
phrase
refers
to
a
discourse-new
entity
implicitly
related
to
some
previously
mentioned
entity
.
For
example
,
we
can
see
a
relation
of
bridging
reference
between
"
the
door
"
and
"
the
room
"
in
"
She
entered
the
room
.
The
door
closed
automatically
.
"
A
common
approach
is
to
use
co-occurrence
statistics
between
the
referring
expression
(
e.g.
"
the
door
"
in
the
above
example
)
and
the
related
entity
(
"
the
room
"
)
(
Bunescu
,
2003
;
Poesio
et
al.
,
2004
)
.
Our
approach
newly
incorporates
automatically
induced
syntactic
patterns
as
contextual
clues
into
such
a
co-occurrence
model
,
producing
significant
improvements
of
accuracy
.
3.2
Our
approach
Now
we
describe
our
approach
to
aspect-evaluation
and
aspect-of
relation
extraction
.
The
key
idea
is
to
combine
the
following
two
kinds
of
information
using
a
machine
learning
technique
for
both
tasks
.
Contextual
clues
:
Syntactic
patterns
such
as
which
matches
such
a
sentence
as
???sekkyaku???-ga
???service???
-
(
The
waiters
able
.
)
kunrens-aretei-te
???kimochiyoi???
be
trained
-
???feel
comfortable???
were
well-trained
,
so
I
felt
comfort
-
are
considered
to
be
useful
for
extracting
relations
between
slot
fillers
when
they
appear
in
a
single
sentence
(
Here
,
??????
indicates
a
slot
filler
)
.
We
employ
a
supervised
learning
technique
to
search
for
such
useful
syntactic
patterns
.
Context-independent
statistical
clues
:
Statistics
such
as
aspect-aspect
and
aspect-evaluation
co-occurrences
are
expected
to
be
useful
.
We
obtain
such
statistical
clues
automatically
from
a
large
collection
of
raw
documents
.
In
what
follows
,
we
describe
our
method
for
aspect-evaluation
.
The
aspect-of
relation
extraction
is
done
in
an
an
analogous
way
.
3.2.1
Supervised
learning
of
contextual
clues
Let
us
consider
the
problem
of
searching
for
the
aspect
of
a
given
evaluation
expression
t.
This
problem
can
be
decomposed
into
binary
classification
problems
of
deciding
whether
each
pair
of
candidate
aspect
c
and
target
t
is
in
an
aspect-evaluation
relation
or
not
.
Our
goal
is
to
learn
a
discrimination
function
for
this
classification
problem
.
If
such
a
function
is
obtained
,
we
can
identify
the
most
likely
candidate
aspect
simply
by
selecting
the
best
scored
c-t
pair
and
,
if
its
score
is
negative
for
all
possible
candidates
,
we
conclude
that
t
has
no
corresponding
aspect
in
the
candidate
set
.
For
finding
syntactic
patterns
that
extract
an
aspect
c
starting
with
an
evaluation
t
,
we
first
represent
all
the
sentences
in
the
annotated
corpus
that
has
both
an
aspect
and
its
evaluation
,
as
shown
in
Figure
3.
A
sentence
is
analyzed
by
a
dependency
parser
,
then
the
dependency
tree
is
converted
so
as
to
represent
the
relation
between
content
words
clearly
and
to
attach
other
information
(
such
as
POS
labels
and
other
morphological
features
of
content
words
and
the
functional
words
attached
to
the
content
words
)
as
shown
in
the
lower
part
of
Figure
3.
Among
various
classifier
induction
algorithms
for
tree-structured
data
,
in
our
experiments
,
we
have
so
far
examined
Kudo
and
Matsumoto
(
2004
)
'
s
algorithm
,
packaged
as
a
free
software
named
BACT
.
Given
a
set
of
training
examples
represented
as
ordered
trees
labeled
either
positive
or
negative
class
,
this
algorithm
learns
a
list
of
weighted
decision
stumps
as
a
discrimination
function
with
the
Boosting
algorithm
.
Each
decision
stump
is
associated
with
tuple
???s
,
l
,
w???
,
where
s
is
a
subtree
appearing
in
the
training
set
,
l
a
label
,
and
w
a
weight
of
this
pattern
.
The
strength
of
this
algorithm
is
that
it
automatically
acquires
structured
features
and
allows
us
to
analyze
the
utility
of
features
.
Given
a
c-t
pair
in
an
annotated
sentence
,
tree
encoding
of
this
sentence
is
done
as
follows
:
First
,
we
use
a
dependency
parser
to
obtain
a
dependency
tree
as
in
Figure
3
(
a
)
.
We
assume
"
ke??ki
(
cake
)
"
as
the
candidate
aspect
c
and
"
oishii
(
delicious
)
"
as
the
target
evaluation
t.
We
then
find
the
path
between
t
and
c
together
with
their
daughter
nodes
.
For
example
,
the
node
"
Darling-no
(
Darling
's
)
"
is
kept
since
it
is
a
daughter
of
c.
Then
,
all
the
content
words
are
abstracted
to
either
of
the
class
types
,
evaluation
,
aspect
or
node
,
that
is
,
c
is
renamed
as
"
aspect
"
,
t
as
"
evaluation
"
and
all
other
content
words
as
"
node
"
.
Other
information
of
a
content
word
and
the
information
of
functional
words
attaching
to
the
content
word
are
represented
as
the
leaf
nodes
as
shown
in
Figure
3
(
b
)
.
The
features
used
in
our
experiments
are
summarized
in
Table
2.
We
apply
the
same
method
to
the
aspect-of
relation
extraction
by
replacing
the
"
evaluation
"
label
as
the
second
"
aspect
"
label
.
3.3
Context-independent
statistical
clues
We
also
introduce
tical
clues
.
i.
Co-occurrences
of
aspect-evaluation
/
aspect-aspect
:
Among
various
ways
to
estimate
the
strength
of
association
(
e.g.
the
number
of
hits
returned
from
a
search
engine
)
,
in
our
experiments
,
we
extracted
aspect-aspect
and
aspect-evaluation
co-occurrences
in
1.7
million
weblog
posts
using
the
patterns
"
???aspect???
ga
/
wa
/
mo
???evaluation???
(
???aspect???
is
(
subject-marker
)
???evaluation???
)
"
and
"
???aspect
A???
no
???aspect
B???
ga
/
wa
(
???aspect
B???
of
???aspect
A???
is
)
"
.
To
avoid
the
data
sparseness
problem
,
we
use
Probabilistic
Latent
Semantic
Indexing
(
PLSI
)
(
Hofmann
,
1999
)
to
estimate
conditional
probabilities
P
(
Aspect
|
Evaluation
)
and
P
(
Aspect
A
|
Aspect
B
)
.
We
then
incorporate
the
the
kinds
of
statis
-
following
two
information
of
these
probability
scores
into
the
learning
model
described
in
3.2
by
encoding
them
as
a
feature
that
indicates
the
relative
score
rank
of
each
candidate
in
a
given
candidate
set
(
see
Table
2
)
.
ii
.
Aspect-hood
of
candidate
aspects
:
Aspect-hood
is
an
index
of
the
degree
that
measures
how
plausible
a
term
is
used
as
an
aspect
within
a
given
domain
.
We
consider
that
a
phrase
directly
co-occurred
with
a
subject
often
is
likely
to
be
an
aspect
of
the
subject
,
and
extract
the
expression
X
which
appears
in
the
form
"
Subject
no
X
(
X
of
Subject
)
"
and
the
expression
Y
which
appears
in
the
form
"
X
no
Y
"
.
We
calculate
the
aspect-hood
of
the
expressions
X
and
Y
by
the
pointwise
mutual
information
.
This
score
is
also
used
as
a
features
(
see
Table
2
)
.
3.4
Intra
-
/
inter-sentential
relation
extraction
Syntactic
pattern
induction
as
described
in
3.2.1
can
apply
only
when
an
aspect-evaluation
(
or
aspect-of
)
relation
appears
in
a
single
sentence
.
We
therefore
build
a
separate
model
for
inter-sentential
relation
extraction
,
which
is
carried
out
after
intra-sentential
relation
extraction
.
1
)
Intra-sentential
relation
identification
:
Given
a
target
evaluation
(
or
aspect
)
,
select
the
most
likely
candidate
aspect
c
within
the
target
sentence
with
the
intra-sentential
model
described
in
3.2.1
.
If
the
score
of
c
is
positive
,
return
c
;
otherwise
,
go
to
the
inter-sentential
relation
extraction
phase
.
2
)
Inter-sentential
relation
identification
:
Search
for
the
most
likely
candidate
aspect
in
the
sentences
preceding
the
target
evaluation
(
or
aspect
)
.
This
task
can
be
regarded
as
a
zero-anaphora
resolution
problem
.
For
this
purpose
,
we
employ
the
supervised
learning
model
for
zero-anaphora
resolution
proposed
by
(
Iida
et
al.
,
2003
)
.
3.5
Opinion-hood
determination
Evaluation
phrases
do
not
always
extract
correctopinion
units
in
a
given
domain
.
Consider
an
example
from
the
digital
camera
domain
,
"
The
weather
was
good
.
so
I
went
to
the
park
to
take
some
pictures
"
.
"
good
"
expresses
the
evaluation
for
"
the
weather
"
,
but
"
the
weather
"
is
not
an
aspect
of
digital
cameras
.
Therefore
,
???the
weather
,
good???
is
not
an
opinion
in
the
digital
camera
domain
.
We
can
consider
a
binary
classification
task
of
judging
whether
the
obtained
opinion
unit
is
a
real
opinion
or
not
in
a
given
domain
.
In
this
paper
,
we
conduct
a
preliminary
experiment
which
uses
the
opinion-hood
determination
model
learned
by
Support
Vector
Machines
.
We
conduct
the
model
using
our
opinion-annotated
corpus
.
The
positive
examples
are
aspect-evaluation
pairs
annotated
in
the
corpus
.
The
negative
examples
are
artificially
generated
as
follows
:
We
first
identify
the
expression
in
the
evaluation
dictionary
that
appear
in
our
annotated
corpus
.
We
then
apply
the
above
aspect-evaluation
extraction
method
and
get
the
most
plausible
candidate
aspect
.
The
result
is
regarded
as
a
negative
example
if
the
extracted
aspect
is
not
the
true
aspect
.
The
features
we
used
in
our
experiments
are
summarized
in
Table
2.
4
Experiments
We
conducted
experiments
with
our
Japanese
opinion-annotated
corpus
to
empirically
evaluate
the
performance
of
our
approach
.
In
these
experiments
,
we
separately
evaluated
the
models
of
aspect-evaluation
relation
extraction
,
aspect-of
relation
extraction
,
and
opinion-hood
determination
.
4.1
Common
settings
We
chose
395
weblog
posts
in
the
restaurant
domain
from
our
opinion-annotated
corpus
described
in
2.1
,
and
conducted
5-fold
cross
validation
on
that
dataset
.
As
preprocessing
,
we
analyzed
this
corpus
using
the
Japanese
morphological
analyzer
ChaSen
and
the
Japanese
dependency
structure
analyzer
CaboCha
.
The
results
are
summarized
in
Tables
3
and
4.
We
evaluated
the
results
by
recall
R
and
precision
P
defined
as
follows
Note
that
,
in
aspect-of
relations
,
we
permit
???A
,
C???
to
be
correct
when
the
data
includes
the
chain
of
aspect-of
relations
???A
,
B???
and
???B
,
C???.
Therefore
,
we
merged
the
intra
-
and
inter-sentential
results
as
shown
in
Table
4.
Features
for
contextual
clues
-
Position
of
c
/
t
in
the
sentence
(
beginning
,
end
,
other
)
-
Base
phrase
distance
between
c
and
t
(
1
,
2
,
3
,
4
,
other
)
-
Whether
c
and
t
has
a
immediate
dependency
relation
-
Whether
c
precedes
t
-
Whether
c
appears
in
a
-
Part-of-speech
of
c
/
t
-
Suffix
of
c
(
-
sei
,
-
sa
(
-
ty
)
,
etc.
)
-
Character
type
of
c
(
English
,
Chinese
,
Katakana
,
etc.
)
-
Semantic
class
of
c
derived
from
Nihongo
Goi
Taikei
(
Ike-hara
et
al.
,
1997
)
.
Features
for
statistical
clues
-
Co-occurrence
score
rank
of
c
(
1st
,
2nd
,
3rd
,
4th
,
other
)
-
Aspect-hood
score
rank
of
c
(
1st
,
2nd
,
3rd
,
4th
,
other
)
quoted
sentence
The
Contextual
and
Contextual+statistics
models
are
our
proposed
models
where
the
former
uses
only
contextual
clues
(
3.2.1
)
and
the
latter
uses
both
contextual
and
statistical
clues
.
We
prepared
two
baseline
models
,
one
for
each
of
the
above
tasks
.
The
Pattern
model
(
in
Table
3
)
simulates
the
pattern-based
method
proposed
by
Tateishi
at
al.
(
2004
)
,
which
uses
the
following
patterns
:
"
???Aspect???
case-particle
???Evaluation???
"
and
"
???Evaluation???
syntactically
depends
on
???Aspect???
"
.
The
Co-occurrence
model
(
in
Table
4
)
simulates
the
co-occurrence
statistics-based
model
used
in
bridging
reference
resolution
(
Bunescu
,
2003
)
:
For
an
aspect
expression
,
we
select
the
nearest
candidate
that
has
the
highest
positive
score
of
the
pointwise
mutual
information
regardless
of
its
occurrence
(
i.e.
inter-or
intra-sentential
)
.
Comparing
the
Pattern
(
Co-occurrence
)
model
with
the
Contextual
model
shows
the
effects
of
the
supervised
learning
with
contextual
clues
,
while
comparison
of
the
Contextual
and
Contextual+statistics
models
shows
the
joint
effect
of
combining
contextual
and
statistical
clues
.
4.3
Results
and
discussions
As
for
the
aspect-evaluation
relation
extraction
,
concerning
the
intra-sentential
cases
,
we
can
see
that
the
models
using
the
contextual
clues
show
nearly
10%
improvement
in
both
precision
and
recall
.
This
indicates
that
the
machine
learning-based
method
has
a
great
advantage
over
the
pattern-based
approach
.
Similar
results
are
seen
in
aspect-of
relation
extraction
.
The
models
using
the
contextual
clues
achieved
more
than
10%
improvement
in
pre
-
intra-sent
.
Patterns
P
R
P
R
P
R
cision
and
20%
improvement
in
recall
over
the
co-occurrence
statistics-based
model
.
We
can
say
that
contextual
clues
are
also
useful
in
aspect-of
relation
extraction
.
In
comparing
the
Contextual
and
Contextual+statistics
models
,
on
the
other
hand
,
we
could
get
only
a
slight
improvement
,
which
indicates
that
we
need
to
estimate
the
statistical
clues
more
precisely
.
We
found
that
the
unsophisticated
estimation
of
the
statistical
clues
was
a
major
source
of
errors
in
aspect-of
relation
extraction
,
however
,
this
estimation
is
not
so
easy
since
the
correct
expressions
are
appeared
only
once
in
large
data
.
We
are
seeking
efficient
ways
to
avoid
data
sparseness
problem
(
e.g.
categorize
the
aspects
)
.
In
the
aspect-evaluation
relation
extraction
,
we
evaluated
the
results
against
the
human
annotated
gold-standard
in
a
strict
manner
.
However
,
according
to
our
error
analysis
,
some
of
the
errors
can
be
regarded
as
correct
for
some
real
applications
.
In
the
following
example
,
a
relation
annotated
by
the
human
is
"
aji
(
taste
)
,
koi-me
(
strong
)
"
.
misoshiru-wa
miso
soup
-
(
The
taste
of
???aji???-ga
???taste???-the
miso
soup
???koi-me???
???strong???
is
strong
.
)
However
,
there
is
no
harm
to
consider
that
"
mis-oshiru
(
miso
soup
)
,
koi-me
(
strong
)
"
is
also
correct
.
If
we
judge
these
cases
as
correct
,
the
Proposed
models
achieve
nearly
0.8
precision
and
0.7
recall
,
and
the
baseline
model
also
get
7
%
improvement
(
precision
0.63
and
recall
0.6
)
.
Based
on
this
result
,
we
consider
that
we
achieved
reasonable
performance
in
intra-sentential
aspect-evaluation
relation
extraction
.
As
Table
3
shows
,
traction
achieved
very
inter-sentential
relation
ex-poorly
.
In
the
case
of
inter-sentential
relations
,
our
model
tends
to
rely
heavily
on
the
statistical
clues
,
because
syntactic
pattern
features
cannot
be
used
.
However
,
our
current
method
for
estimating
co-occurrence
distributions
is
not
sophisticated
as
we
discussed
above
.
We
need
to
seek
for
more
effective
use
of
large
scale
domain
dependent
data
to
obtain
better
statistics
.
We
also
conducted
a
preliminary
test
of
the
opinion-hood
determination
model
using
the
features
used
in
aspect-evaluation
relation
extraction
.
Opinion-hood
determination
problem
includes
two
decisions
:
whether
the
evaluation
candidate
is
an
opinion
or
not
,
and
whether
the
opinion
is
related
to
the
given
domain
if
the
evaluation
candidate
is
an
opinion
.
We
plan
to
use
various
features
known
to
be
effective
in
the
sentence
subjectivity
recognition
task
.
This
task
involves
challenging
problems
.
For
example
,
sentence
(
1
)
includes
the
writer
's
evaluation
on
shrimps
served
at
a
particular
restaurant
.
In
contrast
,
very
similar
sentence
(
2
)
does
not
express
evaluation
since
it
is
a
generic
description
of
the
writer
's
taste
.
(
1
)
watashi-wa
konomise-no
ebi-ga
I
the
restaurant
shrimp
(
I
like
shrimps
of
the
restaurant
.
)
watashi-wa
ebi-ga
suki-desu
I
shrimps
like
(
I
like
shrimps
.
)
suki-desu
like
Thus
we
need
to
conduct
further
investigation
der
to
resolve
this
kind
of
problems
.
in
or
-
4.4
Portability
of
intra-sentential
model
We
next
evaluated
effectiveness
of
the
contextual
clues
learned
in
the
domains
to
other
domains
by
testing
a
model
trained
on
the
certain
domains
to
other
domain
.
We
selected
two
new
domains
,
cellular
phone
and
automobile
,
and
annotated
290
we-blog
posts
in
each
domain
.
For
the
restaurant
domain
,
we
randomly
selected
290
posts
from
the
previously
mentioned
our
annotated
corpus
.
We
then
divide
each
data
set
to
a
training
set
and
a
test
set
so
that
we
had
the
same
amount
of
training
data
for
each
domain
.
Then
we
trained
a
model
on
the
data
for
each
domain
,
and
applied
it
to
each
of
the
three
set
of
data
.
Table
5
shows
the
results
of
the
experiment
.
Compared
with
the
model
trained
on
the
same
domain
,
the
models
trained
on
different
domains
exhibited
almost
comparable
performance
.
This
in
-
test
same
dom
.
other
dom
same
dom
.
other
dom
dicates
that
the
contextual
clues
learned
in
other
domains
are
effective
in
another
domain
,
showing
the
cross-domain
portability
of
our
intra-sentential
model
.
5
Conclusion
In
this
paper
,
we
described
our
opinion
extraction
task
,
which
extract
opinion
units
consisting
of
four
constituents
.
We
showed
the
feasibility
of
the
task
definition
based
on
our
corpus
study
.
We
consider
the
task
as
two
kinds
of
relation
extraction
tasks
,
aspect-evaluation
relation
extraction
and
aspect-of
relation
extraction
,
and
proposed
a
machine
learning-based
method
which
combines
contextual
clues
and
statistical
clues
.
Our
experimental
results
show
that
the
model
using
contextual
clues
improved
the
performance
for
both
tasks
.
We
also
showed
domain
portability
of
the
contextual
clues
.
