language models 0.001571577
probability vector 0.001407825
word length 0.0013899820000000001
word frequency 0.001272046
word types 0.00124803
corresponding models 0.001247609
specific word 0.001208656
sample size 0.0011836210000000002
probabilistic model 0.001171054
other words 0.0011510119999999999
distinct word 0.0011459109999999999
unlikely word 0.001144361
word frequencies 0.001141667
erage word 0.0011334280000000001
model intro 0.001112196
different values 0.001061978
vocabulary size 0.001053979
size estimates 0.0010433600000000001
probability distribution 0.001042262
models 9.87658E-4
distribution function 9.86844E-4
observation size 9.68815E-4
new sample 9.66453E-4
different corpora 9.58747E-4
size change 9.50858E-4
language corpora 9.48335E-4
probability distributions 9.451819999999999E-4
ing language 9.43003E-4
different estimators 9.35483E-4
population size 9.23822E-4
times corpus 9.20002E-4
similar sample 9.19486E-4
ple size 9.11433E-4
other vocabulary 9.09551E-4
theoretical performance 9.01294E-4
lary size 8.9636E-4
size scales 8.95028E-4
cabulary size 8.93094E-4
model 8.89957E-4
unseen words 8.8466E-4
our estimator 8.7179E-4
different sizes 8.666430000000001E-4
first step 8.62313E-4
empirical performance 8.620500000000001E-4
natural language 8.61964E-4
small values 8.480180000000001E-4
actual corpus 8.467679999999999E-4
first term 8.443980000000001E-4
malayalam language 8.3783E-4
different estima 8.36294E-4
performance guarantees 8.354790000000001E-4
malayalam corpus 8.35334E-4
performance result 8.27383E-4
hindi language 8.27225E-4
parametric estimator 8.270269999999999E-4
hindi corpus 8.24729E-4
vector 8.20777E-4
standard corpora 8.20035E-4
deterministic probability 8.12885E-4
consistent estimator 8.06174E-4
estimator com 8.01158E-4
other hand 7.97324E-4
art estimator 7.887840000000001E-4
theoretical work 7.88615E-4
favorable performance 7.87805E-4
sample sizes 7.858100000000001E-4
pirical performance 7.83249E-4
convergence assumption 7.82906E-4
nonparametric estimator 7.802320000000001E-4
following form 7.78377E-4
first study 7.77235E-4
novel estimator 7.7637E-4
power series 7.76196E-4
random variable 7.70338E-4
sentence length 7.67263E-4
arbitrary sample 7.498650000000001E-4
metric estimator 7.47246E-4
turing estimator 7.39698E-4
ral estimator 7.3891E-4
simple calculation 7.352820000000001E-4
baayen estimator 7.349E-4
chao estimator 7.349E-4
vocabulary growth 7.31423E-4
english corpora 7.31149E-4
finite vocabulary 7.296150000000001E-4
following corpora 7.294560000000001E-4
large corpora 7.28876E-4
many estimators 7.271129999999999E-4
frequency distributions 7.206339999999999E-4
error estimates 7.1581E-4
main results 7.11124E-4
following estimators 7.06192E-4
previous work 7.048499999999999E-4
quency information 6.964E-4
new york 6.95334E-4
lnre property 6.94374E-4
above estimators 6.933849999999999E-4
same population 6.92923E-4
large number 6.843859999999999E-4
tural distribution 6.77476E-4
