human evaluation 0.00385619
evaluation metrics 0.00334434
evaluation metric 0.00331227
translation evaluation 0.00315528
automatic evaluation 0.002728252
phrase evaluation 0.002595221
human scores 0.0025916610000000003
evaluation task 0.002546728
first evaluation 0.002494146
evaluation work 0.002445672
evaluation methods 0.002429746
evaluation tasks 0.00241359
human fluency 0.00240546
human judges 0.002343052
man evaluation 0.002319172
automated evaluation 0.002315273
human judgments 0.002306295
evaluation algorithm 0.002305602
human adequacy 0.002297322
human annotators 0.002219691
human evaluations 0.002192751
human judg 0.002182784
human repair 0.002171432
automatic metrics 0.002096072
such metrics 0.002048237
evaluation 0.00198826
different language 0.00198629
bleu metric 0.001917495
human 0.00186793
word model 0.001806547
multiple metrics 0.001734689
different systems 0.001729562
tomatic metrics 0.001719248
automated metrics 0.001683093
reference sentence 0.0016805589999999999
source language 0.001680086
matic metrics 0.001669318
ferent metrics 0.001661442
machine translation 0.001646924
language models 0.001631247
same system 0.0015946900000000002
translation edit 0.001567977
paraphrase data 0.001491241
target language 0.001490457
language generation 0.001480645
moderate correlation 0.001467761
perceptron model 0.001452581
correlation coefficient 0.001447122
different ranking 0.0014455240000000001
rank correlation 0.001428394
model comparisons 0.00142816
reference sentences 0.001413362
words figure 0.001406694
little correlation 0.001406435
sentence types 0.001402304
natural language 0.001398604
different length 0.001395547
weak correlation 0.001392026
different input 0.001386943
erence sentence 0.001384422
inverse correlation 0.001383189
pothesis sentence 0.001378648
paired sentence 0.001378648
openccg model 0.001377368
other system 0.001363722
different realizer 0.001362549
ranking systems 0.0013613660000000001
metrics 0.00135608
guage model 0.001342358
lexical items 0.001336387
lexical choice 0.001329309
same realizer 0.001326731
metric 0.00132401
bleu scores 0.001317216
different versions 0.001288278
treebank data 0.001288247
realizer systems 0.001278391
machine translations 0.001277539
lexical categories 0.001262682
fluency scores 0.001261261
lexical smooth 0.0012576190000000002
paraphrase matching 0.001224392
average scores 0.001212273
data sources 0.001189151
same trigram 0.001185314
further research 0.0011824700000000001
realizer system 0.001179337
same concept 0.001179116
wordnet system 0.001167985
translation 0.00116702
acceptable translations 0.001157885
adequacy scores 0.001153123
surface realizations 0.001152571
surface realizer 0.001151637
data tables 0.001146118
data preparation 0.001143681
ptb data 0.001143681
such error 0.001132867
ual systems 0.001131922
particular test 0.0011315399999999999
