training data 0.00305587
test data 0.00277799
unseen data 0.002290052
probability model 0.002251757
ing data 0.002246359
random data 0.002229871
backoff model 0.002117008
data size 0.0021083350000000002
data examples 0.002087836
smoothing model 0.002087414
single data 0.002056684
word corpus 0.002054216
nyt data 0.002026384
labeled data 0.0019975310000000003
different training 0.001992458
data point 0.0019877090000000003
development data 0.001977646
data experiment 0.001974679
ing training 0.0019592489999999997
effect data 0.0019299550000000001
apw data 0.0019158170000000002
preference model 0.001854815
small training 0.001846353
training size 0.0018212249999999999
erk model 0.001814416
model implementation 0.001813571
google model 0.0018082999999999999
guage model 0.001805128
training examples 0.0018007259999999999
test sentence 0.001753312
training arguments 0.0017418849999999999
nyt training 0.0017392739999999999
test set 0.00165344
ferent training 0.001652569
test pairs 0.0016325290000000002
training sizes 0.001629714
ditional training 0.001624274
varying training 0.001621118
unseen words 0.001601516
word sense 0.001598224
language tasks 0.001553736
random words 0.001541335
word evaluations 0.001537657
test argument 0.001532923
model 0.00153204
test examples 0.001522846
different corpus 0.001519344
other models 0.0015061710000000002
corpus frequency 0.001498159
test corpora 0.001498001
similar corpus 0.001490973
test pair 0.001482008
original word 0.001468313
nyt test 0.001461394
language modeling 0.001448871
test sets 0.001444669
labeled test 0.001432541
test documents 0.001415758
head word 0.00140382
training 0.00138438
randomization test 0.00138192
word creation 0.001381047
ternative word 0.001381047
probability baseline 0.001379096
language processing 0.001378165
test example 0.0013752600000000001
dom test 0.0013540940000000001
test docu 0.001343435
sentative test 0.001343435
results results 0.00134261
natural language 0.001334361
rare words 0.001330868
web corpus 0.00128192
different results 0.0012793829999999998
ambiguous words 0.0012685180000000002
gigaword corpus 0.001257965
head words 0.0012438240000000001
similar baseline 0.001239086
unambiguous words 0.001229241
corpus frequencies 0.0012250759999999999
partner words 0.0012244460000000001
pseudo words 0.001223275
national corpus 0.001218027
corpus differences 0.001216452
corpus fre 0.001208616
broad corpus 0.001198389
different approaches 0.001174867
other nlp 0.001170868
bnc corpus 0.001163006
individual corpus 0.001161582
likely corpus 0.001161531
new baseline 0.001154595
gaword corpus 0.001148947
corpus files 0.001148947
frequency random 0.001145274
unseen pairs 0.001144591
simple baseline 0.001127175
smoothing approaches 0.001122163
similarity scores 0.001108453
evaluation results 0.0010750249999999999
