word segmentation 0.00392513
segmentation algorithm 0.00341652
unsupervised segmentation 0.00290551
segmentation algorithms 0.0028402189999999997
new segmentation 0.002818299
precision segmentation 0.002724788
candidate segmentation 0.002709493
segmentation literature 0.002647828
segmentation quality 0.002638047
seed segmentation 0.002634688
rect segmentation 0.0025809879999999998
ery segmentation 0.002580105
segmentation liter 0.002580105
segmentation 0.00230402
unsupervised word 0.0022226
word boundaries 0.002027049
correct word 0.001972426
segmented corpus 0.001747446
description length 0.001632703
benchmark corpus 0.001579245
lexicon model 0.00157738
separate corpus 0.001565896
language cor 0.0015488759999999998
ratner corpus 0.001532636
different model 0.0015141619999999999
natural language 0.001501671
language corpora 0.001478959
mentation algorithm 0.001459055
unique words 0.0014498430000000001
sampling words 0.00143091
efficient algorithm 0.001412301
ticular language 0.0013905529999999999
tractable algorithm 0.001388928
tion length 0.001321294
likely model 0.0013093599999999999
model selection 0.0012775129999999999
model yields 0.001268801
other methods 0.001264345
information cost 0.0012566769999999999
length calculation 0.001242083
corpus 0.00123121
unsupervised mdl 0.0012199020000000001
data structure 0.001204556
other algo 0.0011958049999999999
sentence boundaries 0.001155681
mdl algorithms 0.001154611
words 0.00115305
other votes 0.001143816
unsupervised algorithms 0.0011376889999999999
language 0.00111405
algorithm 0.0011125
low description 0.0010994070000000002
prior segmentations 0.001085708
initial seed 0.0010775889999999999
boundary information 0.001060395
minimum description 0.00104512
small lexicon 0.00103344
true description 0.001020723
possible segmentations 0.001019691
mdl methods 0.0010180200000000001
human parameter 0.001015584
lexicon size 0.0010150040000000001
mdl framework 0.001005874
model 9.89741E-4
cost function 9.82236E-4
several algorithms 9.72706E-4
unsupervised settings 9.69843E-4
imum description 9.69653E-4
previous segmentations 9.6455E-4
length 9.58695E-4
minimal description 9.54851E-4
such algorithms 9.53479E-4
full description 9.51983E-4
candidate segmentations 9.47214E-4
future performance 9.34415E-4
unsupervised algo 9.32558E-4
mdl princi 9.25055E-4
information theory 9.246899999999999E-4
previous work 9.14405E-4
possible boundary 9.12625E-4
vote threshold 9.04936E-4
data files 9.045520000000001E-4
directed speech 9.01771E-4
mdl principle 8.97932E-4
small number 8.97343E-4
viable threshold 8.92264E-4
several results 8.7718E-4
general approach 8.44633E-4
internal entropy 8.34415E-4
potential segmentations 8.230690000000001E-4
probability distribution 8.21153E-4
tential segmentations 8.182619999999999E-4
diverse set 8.13752E-4
effective set 8.13752E-4
selection problem 8.12739E-4
seed knowledge 8.1124E-4
local maximum 8.067879999999999E-4
previous section 7.99831E-4
unbroken sequence 7.85593E-4
source code 7.84961E-4
