training data 0.00268623
training document 0.00266583
training documents 0.0022944899999999997
document similarity 0.0022012769999999997
document location 0.002155274
test document 0.002086205
large document 0.00204806
ing document 0.002007942
document geolocation 0.001992334
wikipedia data 0.001971849
single document 0.001950598
data structure 0.001930067
training set 0.001883564
similar document 0.001848029
data points 0.001798476
document collections 0.001776258
data increases 0.001768276
low document 0.001767173
large training 0.0017537899999999999
document geoloca 0.0017468689999999999
cial data 0.001742507
tra data 0.001742507
cessed data 0.001742507
plentiful data 0.001742507
document density 0.0017275819999999999
large documents 0.00167672
location method 0.0016585200000000001
ing documents 0.001636602
user training 0.001608432
wikipedia documents 0.001580109
few training 0.001561097
similar training 0.001553759
specific documents 0.001545582
small training 0.001511413
training sets 0.001510428
training images 0.001503906
training docu 0.001484725
few documents 0.001484027
document 0.00148005
similar documents 0.001476689
sufficient training 0.001476177
ing model 0.001446132
training tokens 0.0014376459999999999
training subset 0.001426784
available documents 0.0014063589999999998
nearby documents 0.0013995869999999999
method parameters 0.00138231
historical documents 0.001352306
word error 0.001345354
error word 0.001345354
kdcentroid method 0.0013142190000000002
test set 0.001303939
decision method 0.001274566
midpoint method 0.001268877
division method 0.001257412
centroid method 0.001257021
friedman method 0.001250287
splitting method 0.0012487330000000001
language models 0.001248729
geotext method 0.0012412970000000001
turing method 0.001225598
text geolocation 0.0011975570000000001
training 0.00118578
bayesian model 0.001182345
similarity function 0.001178029
grid cell 0.001169424
discounting model 0.001168919
geographic location 0.001168284
unifkdcentroid model 0.0011653129999999999
geographic information 0.001157878
information retrieval 0.001153737
local words 0.001153341
error distance 0.001150074
location clustering 0.001130437
different models 0.0011277890000000001
error analysis 0.00112496
other methods 0.001122888
documents 0.00110871
strong language 0.001102095
language modeling 0.0011009919999999999
other terms 0.0010913770000000001
high accuracy 0.00108453
topic models 0.001083301
mean error 0.001079149
word distribution 0.0010747040000000001
labeled set 0.001069433
word features 0.001060699
true location 0.001060583
cell size 0.00105817
bayes classification 0.001049513
new test 0.001044775
predictive words 0.001032561
location selection 0.0010318670000000001
similar approach 0.001031187
eral language 0.001030664
learning curve 0.001030605
error dis 0.0010282569999999999
development set 0.001026262
median error 0.001025035
cosine similarity 0.001019802
