word distribution 0.002012934
language model 0.001947639
word frequency 0.0017587470000000002
language models 0.0016794879999999998
random word 0.001667132
word error 0.0016357140000000001
word count 0.001633458
next word 0.001629021
word sequence 0.001592423
simple model 0.001418791
unigram model 0.001418556
topic model 0.001374568
certain model 0.001322584
corpus time 0.001318664
size corpus 0.001317956
corpus size 0.001317956
sophisticated model 0.001283385
lda model 0.001282634
data sets 0.001268989
frequency function 0.001239832
training corpus 0.0012314280000000001
same distribution 0.0012307540000000001
probabilistic language 0.001228246
perplexity perplexity 0.001227506
frequency words 0.0012160130000000002
original corpus 0.001194148
language mod 0.001184178
newsgroups data 0.001180236
form distribution 0.001156699
tical language 0.00115318
unigram models 0.001150405
test corpus 0.001134331
gram models 0.0011195509999999999
models figure 0.001106667
topic models 0.001106417
feature space 0.001094575
same perplexity 0.001094383
corpus com 0.001074352
predictive distribution 0.0010523
quadratic function 0.001044086
perplexity value 0.001042964
journal corpus 0.00103954
guage models 0.001037686
model 0.00103225
synthetic corpus 0.001029707
uniform distribution 0.001027778
zeta function 0.0010222109999999999
corpus gen 0.001020203
nal corpus 0.0010190590000000001
same size 0.001018943
ideal corpus 0.001017229
duced corpus 0.001017229
memory perplexity 0.001015335
lda models 0.001014483
search directions 0.001005833
different nlp 0.001005689
computational time 9.9043E-4
elementary function 9.898139999999999E-4
stable distribution 9.8806E-4
distribution ϕˆz 9.8806E-4
dictive distribution 9.8806E-4
decay function 9.860049999999999E-4
heuristic function 9.83709E-4
same frequency 9.76567E-4
different test 9.71904E-4
real perplexity 9.70907E-4
lda learning 9.669559999999999E-4
theoretical perplexity 9.46582E-4
time memory 9.406029999999999E-4
memory size 9.39895E-4
many possibilities 9.34885E-4
many stud 9.32194E-4
many datasets 9.32194E-4
many trials 9.32194E-4
powerful method 9.30725E-4
language 9.15389E-4
perplexity measures 9.07652E-4
first part 9.02212E-4
gram size 8.93765E-4
above assumption 8.889029999999999E-4
different sizes 8.86138E-4
maximum frequency 8.80862E-4
acceptable perplexity 8.73595E-4
perplexity growth 8.64709E-4
perplexity pˆp 8.603630000000001E-4
perplexity pˆpk 8.60311E-4
perplexity pˆpmix 8.53002E-4
vocabulary size 8.499779999999999E-4
smoothing factor 8.300830000000001E-4
optimal smoothing 8.260990000000001E-4
tional time 8.16738E-4
standard extension 8.1522E-4
similar fashion 8.084750000000001E-4
original corpora 8.062620000000001E-4
last part 7.93898E-4
same vocabulary 7.922949999999999E-4
size w˜k 7.91603E-4
following formula 7.90652E-4
power law 7.877299999999999E-4
large corpora 7.84663E-4
