training data 0.00257946
lnre model 0.002391433
frequency models 0.002333192
growth model 0.002306023
other models 0.002284745
gigp model 0.002260417
statistical models 0.0022537060000000003
model partition 0.002247816
sampling model 0.002223382
corpus data 0.002205821
models lnre 0.002200253
lnre models 0.002200253
zipfr model 0.002180464
model predictions 0.002164846
lognormal model 0.002158795
model fitting 0.002158568
trained model 0.002144502
intractable model 0.002144071
simplistic model 0.002144071
test data 0.002109538
distribution models 0.002104562
gigp models 0.002069237
sampling models 0.002032202
training corpus 0.002010141
same data 0.001987989
random word 0.001971498
unadjusted models 0.001966522
word frequency 0.0019657119999999997
viable models 0.001962981
sophisticated models 0.001957131
corrected models 0.001952703
model 0.00192154
unseen data 0.0019163600000000002
large data 0.001871407
same training 0.001792309
training set 0.001746665
models 0.00173036
data sets 0.0017127000000000002
function words 0.0016687030000000001
newspaper data 0.001666248
content word 0.001626713
data one 0.0016151520000000001
data sug 0.001609876
dicted data 0.001609876
pus data 0.001609876
frequency words 0.001563456
other words 0.0015150089999999999
different types 0.001454967
vocabulary words 0.0014384789999999999
separate training 0.0014225099999999999
prediction size 0.001401775
different approach 0.001385313
general words 0.001306188
prediction performance 0.001292823
test set 0.0012767429999999999
different sizes 0.001269913
different documents 0.001241779
other prediction 0.001240554
ical words 0.0012396479999999999
sample size 0.001230276
topical words 0.001228867
content words 0.001224457
different interpretation 0.001219131
large size 0.0011994430000000001
repeated words 0.001198252
different degrees 0.001196018
vocabulary size 0.001193461
training 0.00119189
german words 0.001191359
repubblica corpus 0.001140562
random sample 0.001123288
original corpus 0.00110308
newspaper corpus 0.001096929
national corpus 0.0010808179999999999
full frequency 0.001053883
test sets 0.001047098
unprocessed corpus 0.001042165
dewac corpus 0.001042165
standard lnre 0.001041798
prediction accuracy 0.001030642
many tokens 0.001025719
relative error 0.001018316
prediction rmse 0.00100683
many types 9.98717E-4
size increases 9.95858E-4
prediction sizes 9.84015E-4
frequency distribution 9.770339999999999E-4
new types 9.70696E-4
density function 9.69541E-4
other approaches 9.65933E-4
words 9.60624E-4
certain size 9.56E-4
random gigp 9.47495E-4
additional parameters 9.439769999999999E-4
cabulary size 9.4127E-4
prediction quality 9.41185E-4
arbitrary size 9.410800000000001E-4
ulary size 9.384080000000001E-4
ing set 9.383E-4
random order 9.29339E-4
