language model 0.00341145
training data 0.00280048
test data 0.0027705200000000003
corpus language 0.0025264
discounting model 0.002516825
model gdlm 0.002421449
model level 0.002418977
guage model 0.002413758
jackknife model 0.0023984049999999997
trained model 0.002397903
news data 0.002367896
training corpus 0.00235858
test corpus 0.00232862
model 0.00213958
times data 0.0021186340000000003
newswire data 0.002033332
data sparsity 0.001962359
afp data 0.001959441
train corpus 0.001874677
discount language 0.001843567
training count 0.0017926690000000002
new language 0.00172173
language modeling 0.001706798
discounts language 0.0017057510000000001
word error 0.001705351
training counts 0.001691377
test counts 0.001661417
corpus pairs 0.0016404100000000001
training corpora 0.0016262940000000001
corpus divergence 0.00160447
corpus pair 0.001599335
test corpora 0.001596334
corpus sizes 0.001550504
results perplexity 0.001536171
perplexity results 0.001536171
jackknife language 0.001530695
eral corpus 0.001518132
gaword corpus 0.001514639
corpus divergences 0.001513596
vergent corpus 0.001513596
overall word 0.0014867910000000001
entire training 0.0014288410000000001
different train 0.001410008
perplexity evaluation 0.001379524
training fragments 0.001375871
set perplexity 0.001365611
different set 0.0013115029999999999
different approach 0.001282564
perplexity values 0.0012761439999999999
language 0.00127187
corpus 0.00125453
same discount 0.001253129
same size 0.001237662
perplexity improvements 0.001229448
discount function 0.0012155
different methods 0.001166495
document text 0.001160329
linear function 0.001143882
final perplexity 0.001142788
different interpolation 0.00114278
other features 0.001137562
trigram models 0.001129306
empirical probability 0.001117147
perplexity gains 0.001111558
show perplexity 0.001104676
training 0.00110405
similar train 0.0010892760000000001
different years 0.001086023
evaluation corpora 0.001057799
explicit models 0.001052402
only feature 0.001052271
similar discount 0.001040826
additional corpora 0.001036089
trigram count 0.001035373
empirical discount 0.001022553
words 0.00101549
count range 0.001011626
news articles 0.0010056050000000001
miscellaneous text 0.001004602
single discount 0.001002752
other scheme 0.001001369
ing domain 0.001001103
gram count 0.001000992
same number 9.9678E-4
token counts 9.92861E-4
similar corpora 9.91373E-4
count phenomenon 9.89542E-4
other criteria 9.80825E-4
domain difference 9.80356E-4
history count 9.784680000000001E-4
constant function 9.73529E-4
single value 9.70395E-4
count level 9.68016E-4
count curve 9.679070000000001E-4
count buckets 9.6609E-4
count truncation 9.50148E-4
parametric function 9.4435E-4
probability mass 9.40577E-4
type counts 9.23355E-4
discounts figure 9.13025E-4
