document similarity 0.00249838
simple term 0.002374375
term weighting 0.002340186
term weights 0.00233612
term weight 0.002325853
particular term 0.0022677120000000003
term contributions 0.002232078
ual term 0.002224835
similarity function 0.002026615
similarity score 0.001938732
similarity problem 0.001922769
similarity matrix 0.00191047
similarity algorithm 0.001893918
similarity measure 0.0018833970000000002
similarity scores 0.001786091
similarity functions 0.001783508
pairwise similarity 0.001764516
similarity measures 0.001759953
final similarity 0.0017415820000000002
entire similarity 0.001730575
computing similarity 0.0017236830000000002
similarity our 0.0017124240000000002
symmetric similarity 0.00169952
similarity phase 0.001693204
similarity computa 0.001673694
similarity compar 0.001663779
large document 0.001575245
document collection 0.001539684
document pairs 0.001529975
common terms 0.001523018
document frequency 0.001499561
related terms 0.0014668559999999999
similarity 0.00142978
pairwise document 0.001403336
rare terms 0.001400037
entropy terms 0.001380833
computing document 0.0013625030000000002
intermediate document 0.001355379
problematic terms 0.001351862
document collections 0.001342575
studies document 0.0013326470000000002
document simi 0.0013106250000000002
document frequen 0.001303663
word list 0.001286605
corpus size 0.001256998
data structure 0.001199699
different collection 0.001190326
vocabulary set 0.001184675
basic data 0.001166205
natural language 0.001135306
terms 0.00111601
data exchange 0.001109524
large number 0.001082366
document 0.0010686
same structure 0.001043816
text analysis 0.001005843
large class 9.65019E-4
space complexity 9.52766E-4
empirical results 9.390189999999999E-4
text classification 9.34507E-4
information studies 9.22524E-4
disk space 9.219899999999999E-4
collection size 9.21724E-4
output figure 9.151960000000001E-4
total number 9.14003E-4
file system 8.88036E-4
many tasks 8.615000000000001E-4
vocabulary size 8.59834E-4
several machines 8.590199999999999E-4
other aspects 8.583639999999999E-4
language 8.39743E-4
several orders 8.364659999999999E-4
several proces 8.364659999999999E-4
time increases 8.30836E-4
mapreduce algorithm 8.28681E-4
partial results 8.27145E-4
space savings 8.26898E-4
space requirements 8.26898E-4
arbitrary number 8.1218E-4
corpus 8.06358E-4
analysis problems 7.9901E-4
many applications 7.862100000000001E-4
large collections 7.8062E-4
documents 7.78966E-4
group values 7.74613E-4
entire collection 7.718790000000001E-4
individual score 7.60661E-4
analytical model 7.56521E-4
intermediate key 7.5531E-4
score contributions 7.506100000000001E-4
intermediate pairs 7.48154E-4
total vocabulary 7.47476E-4
experimental evaluation 7.40587E-4
map map 7.32134E-4
access patterns 7.28357E-4
our work 7.24331E-4
newswire text 7.231399999999999E-4
collection sizes 7.19286E-4
weight vectors 7.16805E-4
similar problems 7.16564E-4
