same number 0.001432631
first answer 0.0013243299999999999
same value 0.001288872
other words 0.001286278
answer validation 0.0012355
data set 0.001225678
candidate answer 0.001218895
answer extraction 0.001205318
other value 0.001191737
same algorithm 0.001168348
validation evaluation 0.001163934
evaluation measure 0.001161594
same accuracy 0.001147874
correct answer 0.001138696
different languages 0.00112912
same reasoning 0.001124299
several questions 0.001120025
other measures 0.001091067
business model 0.001078025
different metrics 0.001074725
evaluation measures 0.001074335
different propor 0.001046201
same size 0.001031271
candidate answers 0.001029612
wrong answer 0.001016244
ing evaluation 0.001007993
same time 9.9881E-4
same propor 9.94017E-4
same runs 9.92517E-4
same ranking 9.92174E-4
same proportion 9.898469999999999E-4
new measure 9.848E-4
natural language 9.823850000000001E-4
main evaluation 9.78549E-4
new features 9.7063E-4
new candidate 9.70535E-4
question sets 9.654970000000001E-4
answer ranking 9.49873E-4
correct answers 9.494130000000001E-4
evaluation task 9.37287E-4
validation module 9.37073E-4
language evalu 9.35097E-4
first step 9.32711E-4
other extension 9.279350000000001E-4
unanswered questions 9.210030000000001E-4
language processing 9.19203E-4
accuracy value 9.114519999999999E-4
other tasks 9.1026E-4
cross language 9.065270000000001E-4
first bin 9.02186E-4
other hand 9.01272E-4
accuracy measure 8.98041E-4
several measures 8.966149999999999E-4
other estimations 8.94906E-4
other estimation 8.94906E-4
evaluation collection 8.93437E-4
entire evaluation 8.926149999999999E-4
evaluation forum 8.91691E-4
data collection 8.904130000000001E-4
incorrect answers 8.82512E-4
swered questions 8.81886E-4
question answering 8.803579999999999E-4
qualitative evaluation 8.765649999999999E-4
evaluation methodology 8.765649999999999E-4
sonable evaluation 8.765649999999999E-4
classification problem 8.71715E-4
similar results 8.713729999999999E-4
each question 8.64798E-4
systems performance 8.5805E-4
generation problem 8.51868E-4
model 8.51394E-4
test collections 8.44374E-4
first eval 8.35691E-4
system ability 8.356380000000001E-4
first summand 8.32122E-4
system dis 8.278550000000001E-4
cheating system 8.278550000000001E-4
wrong answers 8.26961E-4
constant value 8.1469E-4
complete value 8.002789999999999E-4
entailment problem 7.99735E-4
following way 7.971300000000001E-4
utility measure 7.94633E-4
confidence value 7.94108E-4
final value 7.933069999999999E-4
certain measure 7.922420000000001E-4
negative value 7.85273E-4
sensible measure 7.82708E-4
answers selection 7.80264E-4
validation modules 7.80097E-4
difference performance 7.778119999999999E-4
performance difference 7.778119999999999E-4
arbitrary value 7.767379999999999E-4
similar example 7.764079999999999E-4
unique measure 7.75415E-4
positive value 7.67715E-4
fuzziness value 7.66331E-4
eral answers 7.66099E-4
date answers 7.654340000000001E-4
ation measure 7.60582E-4
