word length 0.003434965
different word 0.0033496949999999998
other word 0.00321033
word form 0.0031806
new word 0.003156549
english word 0.003121966
possible word 0.003061874
common word 0.00302915
word types 0.003015061
segmented word 0.002948834
word sequences 0.002940922
finnish word 0.002910947
word forms 0.0028977029999999997
word discovery 0.002845889
word lengths 0.002841961
word winners 0.002837135
segmentation algorithm 0.002318988
different segmentation 0.002243495
text segmentation 0.002195636
segmented words 0.002115014
other segmentation 0.00210413
language model 0.002092652
new segmentation 0.002050349
splitting words 0.0020434750000000003
words none 0.002043077
entire words 0.002014747
segmentation algorithms 0.001982685
segmentation performance 0.001974643
segmentation methods 0.0019307810000000001
segmentation figure 0.0018843780000000001
text corpus 0.001839996
new model 0.001823729
segmentation algo 0.0017800490000000001
segmentation baseline 0.0017719510000000001
words 0.0017528
excessive segmentation 0.001733105
tomatic segmentation 0.0017315210000000002
english corpus 0.001660126
model optimization 0.001618889
optimal model 0.0016102199999999999
probabilistic model 0.001580971
rent model 0.001550811
generative model 0.001548809
bilistic model 0.001508202
tic model 0.001506239
corpus size 0.00148216
segmentation 0.00148042
finnish corpus 0.0014491069999999998
corpus sizes 0.001432482
corpus the 0.001426424
description length 0.001371634
morph length 0.001362423
search algorithm 0.001316387
common length 0.001290875
statistical language 0.001277606
unsupervised algorithm 0.001270558
model 0.0012538
length distribution 0.001246165
english data 0.001233989
morphological analysis 0.00122947
test set 0.001228536
morph frequency 0.001213989
data set 0.0012035219999999998
natural language 0.001174254
real length 0.001165121
language processing 0.001154509
different languages 0.0011412129999999999
language modelling 0.001138049
corpus 0.00112478
mdl method 0.0011177539999999999
same morpheme 0.0011146799999999998
viterbi algorithm 0.0011057670000000001
prior information 0.001103992
tation algorithm 0.001100469
length distri 0.001099329
glish language 0.001092169
ural language 0.001091434
morph lexicon 0.001086831
languages data 0.001076781
different elements 0.001076132
different forms 0.001074158
other methods 0.001074071
different sizes 0.001070777
small data 0.001067721
segmented data 0.0010608570000000001
test sets 0.0010520829999999999
input data 0.001041936
character string 0.0010294879999999998
data sets 0.001027069
development test 0.001024658
finnish data 0.00102297
same search 0.001013831
frequency axis 0.001009449
data sizes 0.001006345
initial order 0.001005157
newspaper text 0.001000523
test vocabulary 0.001000335
frequency distributions 9.96727E-4
frequency bins 9.937050000000001E-4
morpheme representation 9.897439999999999E-4
