evaluation metric 0.0028475
metric evaluation 0.0028475
evaluation metrics 0.00268512
reference translation 0.0025600600000000003
human judgments 0.002303996
human judgment 0.002143179
output evaluation 0.002090559
human judg 0.002089834
based evaluation 0.0020097920000000003
english translation 0.00200631
manual evaluation 0.001970345
reference translations 0.001952858
evaluation setup 0.001947074
machine translation 0.001934335
reference words 0.001889029
age translation 0.001848082
translation perfor 0.0018187379999999999
bleu score 0.0017752879999999999
feature matching 0.001753959
phrase matching 0.001714027
arabic reference 0.0016837970000000001
source sentence 0.001674256
teor metric 0.0016354750000000002
language bleu 0.001616847
evaluation 0.00158769
ranking translations 0.0015667010000000002
agreement score 0.00156347
first reference 0.001558009
same task 0.001544114
partial matching 0.0015085620000000002
able metrics 0.001482721
model phrase 0.0014770130000000001
translation 0.00147109
arabic translations 0.0014587150000000002
reliable metrics 0.0014510679999999998
lexical features 0.001445424
reference tokens 0.0014250460000000001
uation metrics 0.001423482
arabic sentence 0.0014104030000000002
lexical information 0.001401996
exact matching 0.001391139
arabic language 0.00137739
matching criterion 0.0013745810000000002
agreement scores 0.0013698
stem matching 0.0013415990000000002
ranking sentences 0.001338402
exhaustive matching 0.001300436
syntactic feature 0.0012970640000000001
average agreement 0.001293256
same annotator 0.001285444
same direc 0.001285444
such criteria 0.001275148
research problem 0.001269258
ranking judgments 0.001261689
metric 0.00125981
linguistic analysis 0.001250011
language pairs 0.001245971
bleu criterion 0.001236923
syntactic features 0.001235417
building systems 0.001233375
further research 0.001212231
complex words 0.001197612
rich language 0.0011934649999999999
ameana system 0.001192869
est correlation 0.001192799
bleu computation 0.001189227
source france 0.0011835629999999999
annotation quality 0.001174963
linguistic knowledge 0.001163063
ranking tasks 0.001162134
morphological features 0.001158145
bilingual phrase 0.001151647
european language 0.001146719
average values 0.001140152
different exper 0.001139744
pairwise agreement 0.001139047
large sample 0.001133711
strict word 0.001128256
age scores 0.001124326
meteor scores 0.001108505
word delimiters 0.001108426
several studies 0.001106913
linguistic knowl 0.001106297
judgments dataset 0.001102062
metrics 0.00109743
arabic lin 0.001096054
reference 0.00108897
lexical levels 0.001083257
phrase synonyms 0.001082182
research community 0.001074805
news stories 0.001074651
phrase tables 0.001073191
news topics 0.001070053
high annotation 0.001069524
morphological level 0.001069101
challenging research 0.001063914
arabic machine 0.001058072
average correla 0.001055019
ical features 0.00104754
annotation process 0.001034054
