language model 0.0037020200000000003
language models 0.0033064900000000005
training data 0.0028505600000000002
test data 0.002665579
task language 0.00265099
ing language 0.002630513
standard language 0.002611747
generalized language 0.0025447440000000003
trained language 0.0025290520000000004
traditional language 0.0025090800000000003
language mod 0.0025041570000000003
data set 0.002498657
kyoto language 0.002497486
classical language 0.0024776800000000003
dard language 0.002476339
language modelling 0.002474861
eralized language 0.002474243
different model 0.002462095
same data 0.0023539060000000002
large data 0.002249441
model interpolation 0.002246095
ing data 0.002223053
evaluation data 0.002218876
language 0.00219634
wikipedia data 0.002172017
domain data 0.00217047
model approach 0.002148347
small data 0.0021326970000000002
order model 0.002128147
data sparsity 0.002121496
data sets 0.002111907
ent data 0.002088165
all data 0.002078424
sparse data 0.002076237
glm model 0.0020742599999999996
different words 0.002006655
training corpus 0.001955435
various model 0.001949167
model toolkit 0.0018331089999999999
unigram model 0.0018311599999999999
model length 0.001829294
word order 0.0018266670000000001
der model 0.0018013649999999999
prediction model 0.0017925039999999999
sub model 0.0017888679999999999
model lengths 0.001788434
model orders 0.0017838259999999999
model implementa 0.0017838259999999999
test corpus 0.0017704539999999999
order models 0.001732617
text corpus 0.001708254
level models 0.001679354
first word 0.001638095
test case 0.0016222889999999998
word sequence 0.0015999730000000002
similar words 0.001583624
word relations 0.0015821560000000001
next word 0.0015702000000000001
word tokens 0.0015577120000000002
word distributions 0.0015569730000000001
word combinations 0.00154156
word statistics 0.001537907
unique words 0.001537484
word sequences 0.0015280570000000002
models mkn 0.001507609
full training 0.001506487
model 0.00150568
rare word 0.001498586
word prediction 0.001491024
guage models 0.0014884990000000001
word forms 0.001483088
zipfian word 0.001483088
gram models 0.0014674290000000001
total words 0.00144735
different languages 0.0014418719999999999
training corpora 0.001433052
der models 0.001405835
small training 0.001405497
same results 0.001404321
bigram models 0.00138969
models pglm 0.0013880390000000002
models leverage 0.0013880390000000002
probability estimates 0.001372805
preceding words 0.001363113
unseen words 0.001359325
true probability 0.0013515539999999998
sparse training 0.001349037
probability mass 0.0013487859999999998
sufficient training 0.001343232
sensitive training 0.001341019
discounted probability 0.001340975
consecutive words 0.00133141
functional words 0.0013298490000000001
mio words 0.0013298490000000001
corpus size 0.001304085
english text 0.001295762
different discount 0.001293919
wikipedia corpus 0.001276892
same set 0.0012748030000000001
different weights 0.001274575
