evaluation metrics 0.00305239
metric evaluation 0.0027720799999999997
automatic metrics 0.00247783
other metrics 0.002348088
metrics task 0.002331848
automatic metric 0.00219752
shared metrics 0.00215038
standard metric 0.0021425430000000002
tomatic metrics 0.002127048
metric scores 0.002125474
tween metrics 0.002085155
paired metrics 0.002071233
peting metrics 0.002069856
human score 0.002023901
human scores 0.001928324
metric pairs 0.001913905
new metric 0.001892828
metric tests 0.001885685
baseline metric 0.0018515509999999999
evaluation task 0.001795698
metrics 0.00179427
metric mimics 0.001791848
machine translation 0.001703992
human judgments 0.0016859470000000001
translation quality 0.0016587820000000001
human judgment 0.0016522580000000002
human assessment 0.00163959
human judg 0.0016037920000000002
focus evaluation 0.001533314
metric 0.00151396
chine translation 0.0014835590000000002
linear correlation 0.001451238
rank correlation 0.001412925
significance test 0.0013913229999999999
same data 0.0013539189999999999
other language 0.0013462069999999999
tion correlation 0.001317433
human 0.00131681
automatic scores 0.001295074
data set 0.001282221
statistical test 0.0012819379999999998
evaluation 0.00125812
williams test 0.0012291099999999998
test matrices 0.001226615
specific test 0.001203605
translation 0.00120066
ter bleu 0.001197385
language pairs 0.001192334
pearson correlation 0.001190789
absolute correlation 0.001186366
score differences 0.001174009
population correlation 0.001167872
meteor bleu 0.001159525
significance figure 0.0011049240000000002
language pair 0.001092888
terrorcat bleu 0.001074925
amber bleu 0.001074234
simpbleu bleu 0.0010597340000000001
dependent data 0.0010524829999999999
data sets 0.001036019
outperform bleu 0.001014106
absolute score 0.001002979
correlations correlations 9.91052E-4
system quality 9.85623E-4
terrorcat figure 9.55284E-4
standard practice 9.533030000000001E-4
statistical power 9.2836E-4
ric scores 9.125019999999999E-4
same reason 9.083639999999999E-4
standard justification 9.06994E-4
discussion figure 9.00227E-4
raw scores 8.96225E-4
significance results 8.95287E-4
correlation 8.90478E-4
meteor ter 8.82552E-4
form significance 8.76998E-4
such cases 8.72448E-4
ter ways 8.71728E-4
statistical significance 8.65387E-4
significant differences 8.62163E-4
significance tests 8.591110000000001E-4
based ones 8.42804E-4
current method 8.39187E-4
individual correlations 8.15148E-4
williams significance 8.12559E-4
significance testing 8.014840000000001E-4
ter amber 7.972610000000001E-4
relative degree 7.971289999999999E-4
novel significance 7.930350000000001E-4
language 7.92389E-4
correlations changes 7.86496E-4
relative merits 7.82495E-4
cant differences 7.81077E-4
pendent correlations 7.78592E-4
ter posf 7.693190000000001E-4
tistical significance 7.640730000000001E-4
sagan ter 7.616750000000001E-4
terrorcat meteor 7.60092E-4
meteor terrorcat 7.60092E-4
wberrcats ter 7.53774E-4
